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^ i Summary 

■ The modern design of industrial structures leads to very complex simulations characterized by nonlinearities, 

• ■ high heterogeneities, tortuous geometries... Whatever the modelization may be, such an analysis leads to 

' the solution to a family of large ill-conditioned linear systems. In this paper we study strategies to efficiently 

solve to linear system based on non-overlapping domain decomposition methods. We present a review of 
most employed approaches and their strong connections. We outline their mechanical interpretations as 
well as the practical issues when willing to implement and use them. Numerical properties are illustrated 
by various assessments from academic to industrial problems. An hybrid approach, mainly designed for 
multifield problems, is also introduced as it provides a general framework of such approaches. 



>: 

Q- 1 INTRODUCTION 

(N 

■ Hermann Schwarz (1843-1921) is often referred to as the father of domain decomposition 

25 '■ methods. In a 1869-paper he proposed an alternating method to solve a PDE equation 

set on a complex domain composed the overlapping union of a disk and a square (fig. [1]), 
giving the mathematical basis of what is nowadays one of the most natural ways to benefit 
the modern hardware architecture of scientific computers. 

In fact the growing importance of domain decomposition methods in scientific compu- 
^ ■ tation is deeply linked to the growth of parallel processing capabilities (in terms of number 

of processors, data exchange bandwidth between processors, parallel library efficiency, and 
of course performance of each processor). Because of the exponential increase of compu- 
tational resource requirement for numerical simulation of more and more complex physical 
phenomena (non-linearities, couplings between physical mechanisms or between physical 
scales, random variables...) and more and more complex problems (optimization, inverse 
problems...), parallel processing appears to be an essential tool to handle the resulting 
numerical models. 

Parallel processing is supposed to take care of two key-points of modern computations, 
the amount of operations and the required memory. Let us consider the simulation of 
a physical phenomenon, classically modeled by a PDE C{x) = f, x £ H{il.). To take 
advantage of the parallel architecture of a calculator, a reflexion has to be carried out 
on how the original problem could be decomposed into collaborating subprocesses. The 
criteria for this decomposition will be: first, the ability to solve independent problems (on 
independent processors); second, how often processes have to be synchronized; and last 
what quantity of data has to be exchanged when synchronizing. When tracing back the 
idle time of resolution processes and analyzing hardware and software solutions, it is often 
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Figure 2. 16 subdomains bitraction test specimen 
(courtesy of ONERA - Pascale Kanoute) 

observed that inter-processor communications are the most penahzing steps. 

If we now consider the three great classes of mathematical decomposition of our ref- 
erence problem which are operator splitting (for instance L = ^ ^ Ci [Uj ) , function-space 
decomposition (for instance H{Q) = span{vi), an example of which is modal decomposi- 
tion) and domain decomposition (0 = |J fij), though the first two can lead to very elegant 
formulations, only domain decompositions ensure (once one subdomain or more have been 
associated to one processor) that independent computations are limited to small quanti- 
ties and that the data to exchange is limited to the interface (or small overlap) between 
subdomains which is always one-to-one exchange of small amount of data. 

So domain decomposition methods perfectly fit the criteria for building efficient algo- 
rithms running on parallel computers. Their use is very natural in engineering (and more 
precisely design) context : domain decomposition methods offer a framework where different 
design services can provide the virtual models of their own parts of a structure, each as- 
sessed independently, domain decomposition can then evaluate the behavior of the complete 
structure just setting specific behavior on the interface (perfect joint, unilateral contact, 
friction). Of course domain decomposition methods also work with one-piece structure (for 
instance fig. [2]), then decomposition can be automated according to criteria which will be 
discussed later. 

Prom an implementation point of view, programming domain decomposition methods 
is not an overwhelming task. Most often it can be added to existing solvers as a upper level 
of current code using the existing code as a black-box. The only requirement to implement 
domain decomposition is to be able to detect the interface between subdomains and use 
a protocol to share data on this common part. In this paper we will mostly focus on 
domain decomposition methods applied to finite element method |il5^ il09j . anyhow they 
can be applied to any discretization method (among others meshless methods El [73] and 
discrete element methods [201 El [22] ) • 

Though domain decomposition methods were more than one century old, they had not 
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been extensively studied. Recent interest arose as they were understood to be well-suited to 
modern engineering and modern computational hardware. An important date in reinterest 
in domain decomposition methods is 1987 as first international congress dedicated to these 
methods occurred and DDM association was created (see http://www.ddm.org). 

Yet the studies were first mathematical analysis oriented and emphasized on Schwarz 
overlapping family of algorithms. As interest in engineering problems grew, non-overlapping 
Schwarz and Schur methods, and coupling with discretization methods (mainly finite ele- 
ment) were more and more studied. Indeed, these methods are very natural to interpret 
mechanically, and moreover mechanical considerations often resulted in improvement to 
the methods. Basically the notion of interface between neighboring subdomains is a strong 
physical concept, to which is linked a set of conservation principles and phenomenological 
laws: for instance the conservation of fluxes (action-reaction principle) imposes the point- 
wise mechanical equilibrium of the interface and the equality of incoming mass (heat...) 
from one subdomain to the outgoing mass (heat...) of its neighbors; the "perfect interface" 
law consists in supposing that displacement field (pressure, temperature) is continuous at 
the interface, contact laws enable disjunction of subdomains but prohibit interpenetration. 

In this context two methods arose in the beginning of the 90's : so-called Finite Element 
Tearing and Interconnecting (FETI) method [H] and Balanced Domain Decomposition 
(BDD) [10_4j. From a mechanical point of view BDD consists in choosing the interface 
displacement field as main unknown while FETI consists in privileging the interface effort 
field. BDD is usually referred to as a primal approach while FETI is a dual approach. One 
of the interests of these methods, beyond their simple mechanical interpretation, is that 
they can easily be explained from a purely algebraic point of view (ie directly from the 
matrix form of the problem). In order to fit parallelization criteria, it clearly appeared that 
the interface problem should be solved using an iterative solver, each iteration requiring 
local {ie independent on each subdomain) resolution of finite element problem, which could 
be done with a direct solver. Then these methods combined direct and iterative solver 
trying to mix robustness of the first and cheapness of the second. Moreover the use of an 
iterative solver was made more efficient by the existence of relevant preconditioners (based 
on the resolution of a local dual problem for the primal approach and a primal local problem 
for the dual approach). 

When first released, FETI could not handle fioating substructures (ie substructures 
without enough Dirichlet conditions), thus limiting the choice of decomposition, while the 
primal approach could handle such substructures but with loss of scalability (convergence 
decayed as the number of fioating substructures increased). A key point then was the intro- 
duction of rigid body motions as constraints and the use of generalized inverses. Because 
of its strong connections with multigrid methods [108], the rigid body motions constraint 
took the name of " coarse problem" , it made the primal and dual methods able to handle 
most decompositions without loss of scalability [431 ITU llU2j . From a mechanical point of 
view, the coarse problem enables non-neighboring subdomains to interact without requiring 
the transmission of data through intermediate subdomains, it then enables to spread global 
information on the whole structure scale. 

Once equipped with their best preconditioners and coarse problems, mathematical re- 
sults [59l[65l[10] provide theoretical scalability of the methods. For instance for 3D elasticity 
problems, if h is the diameter of finite elements and H the diameter of subdomains, condi- 
tion number k of the interface problem reads (C is a real constant): 





which proves that the condition number only depends logarithmatically on the number of 
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elements per subdomain. Many numerical assessment campaigns confirmed the good prop- 
erties of the methods, their robustness compared to iterative solvers applied to the complete 
structure and their low cost (in terms of memory and CPU requirements) compared to di- 
rect solvers. Thus because they are well-suited to modern hardware (like PC clusters) they 
enable to achieve computations which could not be realized on classical computers because 
of too high memory requirement or too long computational time: these methods can handle 
problems with several millions of degrees of freedom. 

Primal and dual methods were extended to heterogeneous problems by a cheap interven- 
tion on the preconditioners [91] and on the initialization [60] . and to forth order elasticity 
(plates and shells) problems by the adjunction of so-called "second level problem" in order 
to regularize the displacement field around the corners of subdomains |38[ \1{)3\ l4Uj . As 
it became clear that the regularization of the displacement field was sufficient to suppress 
rigid body motions, specific algorithms which regularized a priori the subdomain problems 
were proposed: FETIDP [34J and its primal counterpart BDDC [19j, first in the plates and 
shells context, then in the second order elasticity context [63]. Now FETIDP and BDDC 
are considered as efficient as original FETI and BDD. 

Methods were employed in many other contexts: transient dynamics [SOj, multifield 
problems (multiphysic problems such as porous media [SB] and constrained problems such 
as incompressible flows [TQl |55] ) , Helmotz equations [HI [371 ISS] and contact [271 126] • The 
use of domain decomposition methods in structural dynamic analysis is a rather old idea 
though it can now be confronted to well established methods in static analysis; the Craig- 
Bampton algorithm [T7] is somehow the application of the primal strategy to such problems, 
the dual version of which was proposed in [90j, moreover ideas like the adjunction of coarse 
problems enabled to improve these methods. 

Because of the strong connection between primal and dual approaches, some meth- 
ods try to propose frameworks which generalize the two methods. The hybrid approach 
[6T| [57] enables to select specific treatment (primal or dual) for each interface degree of 
freedom; if all degrees of freedom have the same treatment, the hybrid approach is exactly 
a classical approach. For certain multifield problems the hybrid approach enables to define 
physic-friendly solvers. The hybrid approach can also be obtained from specific optimality 
considerations [24J . Mixed approaches [6U[ [5U1 |2H] consist in searching a linear combination 
of interface displacement and effort field, depending on the artificial stiffness introduced on 
the interface one can recover the classical approaches (null stiffness for the dual approach, 
infinite stiffness for the primal approach). Moreover, the mixed approaches enable to pro- 
vide the interface mechanical behavior and provide a more natural framework to handle 
complex interfaces (contact, friction...) than classical approaches. 

In this paper we aim at reviewing most of non-overlapping domain decomposition meth- 
ods. To adopt a rather general point of view we introduce a set of notations strongly linked 
to mechanical considerations as it gives the interface the main role of the methods. We try 
to include all methods inside the same pattern so that we can easily highlight the connec- 
tions and differences between them. We adopt a practical point of view as we describe the 
mechanical concepts, the algebraic formulations, the algorithms and the practical imple- 
mentation of the methods. At each step we try to emphasize keypoints and not to avoid 
theoretical and practical difficulties. 

This paper is organized as follows. In section [2] we introduce the mechanical framework 
of our study, the common notations and the notion of interface assembly operators and 
mechanical operators which will play a central role in the methods. Section [3] provides 
a rather extensive review of the nonoverlapping domain decomposition methods in the 
framework of discretized problems: basic primal and dual approaches (with their variations), 
three-field method for conforming grids, mixed and hybrid approaches. A keypoint of the 
previous methods is the adjunction of optional constraints to form a "coarse problem" which 
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transmits global data through the whole structure, the strategies to introduce these optional 
constraints are studied in section HI which leads to the definition of "recondensed" strategies 
FETIDP and BDDC. Section O deals with practical issues which are very often common 
to most of the methods. Assessments are given in section [6] to illustrate the methods and 
outline their main properties. Section [7] concludes the paper. As Krylov iterative solvers 
are often coupled to domain decomposition methods, main concepts and algorithms to use 
them are given in appendix [Al 



2 Formulation of an interface problem 

To present as smoothly as possible non-overlapping domain decomposition methods we 
first consider a reference continuous mechanics problem, decompose the domain in two 
subdomains in order to introduce interface fields, then in order to describe correctly the 
interface we study a A^-subdomain decomposition. Since our aim is not to prove theoretical 
performance results but make the reader feel some key-points of substructuring, we do not 
go too far in continuous formulation and quickly introduce discretized systems. 



2.1 Reference problem 



/ 




Figure 3. Reference problem 

Let US consider domain 0, in M" (n=l, 2 or 3) submitted to a classical linear elasticity 
problem (see figure [3]) : displacement uq is imposed on part du^ of the boundary of the 
domain, effort g is imposed on complementary part dg^l, volumic effort / is imposed on 17, 
elasticity tensor is o [53\ [T6] . The system is governed by the following equations : 

in Q 
in 

in n (2) 

on dg^l 
on dui^ 

In order to have the problem well posed, we suppose mes{du^) > 0. We also suppose 
that tensor a defines a symmetric definite positive bilinear form on 2nd-order symmetric 
tensors. Under these assumptions, problem ([2]) has a unique solution |i28] . 



div(a) + / = 
a = a : e{u) 

< £{u) = I ^grad(n) -|- grad(u) 

W.n = g 

u = uo 
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2.2 Two-subdomain decomposition 




Figure 4. Two-subdomain decomposition 

Let US consider a partition of domain into 2 substructures $7*^^^ and We define 

interface T between substructures (figure H]) : 



System ^ is posed on domain J7, we write its restrictions to 1^(1) and 17(2) . 



(3) 



1 or 2, < 



di^(^^'^)+/'^ =0 
= a(^) :f(MW) 

(^) 





in 

in O(^) 
in 

on 5^^2 0517^"^ 
on 5^17 fl^^^"-' 



(4) 



and the interface connection conditions, continuity of displacement 

= on T 

and equilibrium of efforts (action-reaction principle) 

^«^(i)+^(2y2)=o onT 
Of course system ([U El [6]) is strictly equivalent to global problem 



(5) 



(6) 



2.3 N-subdomain decomposition 

Let us consider a partition of domain $7 into subdomains denoted We can define 

the interface between two subdomains, the complete interface of one subdomain and the 
geometric interface at the complete structure scale: 



(7) 



When implementing the method, one (possibly virtual) processor is commonly assigned 
to each subdomain, hence because we can tell "local" computations (realized independently 
on each processor) from "global" computations (realized by exchanging data between pro- 
cessors), we often refer to values as being global or local. Then T^^^ is the local interface 
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(a) Primal (geometric) interface (b) Dual interface (connectiv- 

ity) 



Definition of the interface for a Ai'-subdomain decomposition 

and T the global interface. Because exchanges are most often one-to-one, T(*J) is the 
(i — j)-communication interface. 

Using more than two subdomains (except when using "band" -decomposition) leads to 
the appearance of "multiple-points" also called "crosspoints" (which are nodes shared by 
more than two subdomains). These crosspoints lead to the existence of two descriptions of 
the interface (figure [5|): so-called geometric interface T and so-called connectivity interface 
made out of the set of one-to-one interfaces (T(*'-'))i^j<j^^. Each of the two most classical 
methods exclusively uses one of these descriptions so the geometric interface T is often 
referred to as the primal interface while the connectivity interface T is referred to as the 
dual interface. 

How crosspoints are handled is a fundamental key in the differentiation of domain 
decomposition methods. In the remainder of the paper we will always refer to data attached 
to the dual interface using underlined notation. 

Remark 2.1 Reader may have observed that the above presented connectivity description is 
redundant for crosspoints: let x be a crosspoint if x belongs to T^"*^'^) and T^^''^^, it of course 
belongs to T^"^'^^ In the general case of a m-multiple crosspoint, there are C^) connectivity 
relationships while only (m— 1) would be sufficient and necessary. We will present strategies 
to remove these redundancies in the algebraic analysis of the methods. 

Remark 2.2 Cross-points may also introduce, at the continuous level, punctual-interfaces 
in 2d or edge-interfaces in 3d, which are interface with zero measure. Most often from a 
physical point of view these are not considered as interfaces. Anyhow after discretization 
all relationships are written node-to-node and the problem no longer exists. 

2.4 Discretization 

We suppose that the reference problem has been discretized, leading to the resolution of 
n X n linear system: 

Ku = f (8) 

Because of its key role in structural mechanics we will often refer to finite element dis- 
cretization though any other technique would suit. The key points are the link between 
matrix K and the domain geometry and the sparse filling of matrix K (related to the fact 
that only narrow nodes have non-zero interaction). 
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We restrict to the case of element-oriented decompositions (each element belongs to one 
and only one substructure) which are conforming to the mesh which implies three conditions 

m- 

• there is a one-to-one correspondence between degrees of freedom by the interface ; 

• approximation spaces are the same by the interface ; 

• models (beam, shell, 3d...) are the same by the interface. 

Under these assumptions connection conditions simply write as node equalities. For non- 
conforming meshes, a classical solution is to use mortar-elements for which continuity and 
equilibrium are verified in a weak sense [U [21 llOlj . 

2.4-1 Boolean operators 

In order to write communication relation between subdomains we have to introduce several 
operators. 

The first one is the "local trace" operator t^^^ which is the discrete projection from 
ri*^*) to T^**). It enables to cast data from a complete subdomain to its interface, and once 
transposed to extend data set on the interface to the whole subdomain (setting internal 
degrees of freedom to zero). In the remainder of the paper we will use subscript b for 
interface data and subscript i for internal data. 

Then data lying on one subdomain interface has to be exchanged with its neighboring 
subdomains. It can be either realized on the primal interface or the dual interface, leading 
to two (global) "assembly" operators: the primal one A^^\ and the dual one A^'\ The 
primal assembly operator is a strictly boolean operator while the dual assembly operator 
is a signed boolean operator (see figure [6]): if a degree of freedom is set to 1 on one side of 
the interface, its corresponding degree of freedom on the other side of the interface is set to 
— 1. Non-boolean assembly operators can be used in order to average connection conditions 
when using non-conforming domain decomposition methods [6]. 



Remark 2.3 Our {t^'^\ A^'^\ A^^^) set of operators is not the most commonly used in papers 
related to domain decomposition. The interest of this choice is to be sufficient to explain 
most of the available strategies with only three operators. Other notations use "composed 

operators" like fS(^) = A^'k^"^ or L^"^ = A'-'h^"^ ) which are not sufficient to describe all 
methods and which, in a way, omit the fundamental role played by the interface. 



Boolean operators have important classical properties. Please note the first one which 
expresses the orthogonality of the two assembly operators. 

^^(s)^W^ = (9a) 
s 



^(^)^^(s) = diag(multiphcity - i; 



Y{s) 



/ on T(^) 
elsewhere 



(9b) 
(9c) 

(9d) 
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Remark 2.4 An interesting choice of description would have been to use redundant local 
interface (defining some kind oft^^^). This choice would stick to most classical implemen- 
tations where the local interface of one subdomain is defined neighborwise. Dual assembly 
operator would write easily as a simple signing operator, but handling multiple points would 
be slightly more difficult for the primal assembly operator (see section [57S\) . 



Remark 2.5 Redundancies can easily be removed from the dual description of the inter- 
face. One has just to modify the connectivity table, so that one multiple point is connected 
only once to each subdomain. This can be carried out introducing two different assembly 
operators the "non-redundant" one and the " orthonormal" one (see figure^ JJ^I- Only re- 
lationship is modified (then A^'^^ A^^'^ = I~f{s)). The interest of the use of these assembly 
operators will be discussed in section [373[ 
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(a) Local interface 





(b) Redundant connectivity 
3 Y 




(c) Non-redundant connectivity 



(d) Orthonormal connectivity 
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Figure 7. Suppressing redundancies of dual interface 



2.4-2 Basic equations 

In order to rewrite equation ([8|) in a domain-decomposed context, we have to introduce 
the reaction unknown which is the discretization of 

=(1)^(1) = _^(2)^(2) 

equation ([6]). 

A^^^ is the reaction imposed by neighboring subdomains on subdomain (s). Commonly 
A^*-* is defined on the whole subdomain (s) while it is non-zero only on its interface, so 
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Vs, ifWn(^) = /(^) + A(") (10a) 
^^WtWuW=0 (10b) 

s 

^^Wi{^)AW = (10c) 

s 

Equation (llOap corresponds to the (local) equilibrium of each subdomain submitted to 
external conditions f^'^^ and reactions from neighbors A^''^ Equation (llObp corresponds 
to the (global) continuity of the displacement field through the interface. Equation (jlOcP 
corresponds to the (global) equilibrium of the interface (action-reaction principle). 

This three-equation system (jlOp is the starting point from a rich zoology of methods, 
most of which possess strong connections we will try to emphasis on. Before going further 
in the exploration of these methods, we propose to introduce local condensed operators 
that represent a subdomain on its interface, then a set of synthetic notations. 

2.4-3 Local condensed operators 

Philosophically, local condensed operators are operators that represent how neighboring 
subdomains "see" one subdomain: a subdomain can be viewed as a black-box, the only 
information necessary for neighbors is how it behaves on its interface. Associated to this 
idea is the classical assumption that local operations are "exactly" performed. From an 
implementation point of view, when solving problems involving local matrices, a direct 
solver is employed. As we will see, the use of exact local solvers will be coupled with the 
use of iterative global solvers leading to a powerful combination of speed and precision of 
computations. 

In this section we will always refer to the local equilibrium of subdomain (s) under 
interface loading: 

K(s)^is) ^ ^(s) ^ ^is)T^{s) ^^^^ 
(s) 

Primal Schur complement Sp : If we renumber the local degrees of freedom of subdo- 
main (s) in order to separate internal and boundary degrees of freedom, system (jlip 
writes 

I)] (12) 




From the first line we draw 



= -K^~'k\^^ (13) 



is) 

then the Gauss elimination of u] leads to 

[Kii - K^k\!^'"k^ 4'^ = SH'^ = Xi'^ (14) 
which is the condensed form of the local equilibrium of subdomains expressed in 

(s) 

terms of interface fields. Operator Sp is called local primal Schur complement. Its 

(s) 

computation is realized by the inversion of matrix K^- which corresponds to Dirichlet 
conditions imposed on the interface of subdomain (s), so the primal Schur complement 
is always well defined, and commonly called the "local Dirichlet operator". Note that 
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(s) 

the symmetry, positivity, and definition properties are inherited by matrix Sp from 
matrix 

An important result is that the kernel of matrices K^^'^ and Sp'' can be deduced one 

(s) 

from the other (/^ is the identity matrix on the interface): 

^(s)^(s) = ^ S^'h^'^R^'^ = S^'^rI'^ = (15) 
5^4^) =0 =^ l^-^^^p^*^^ = = (16) 

Primal Schur complement can also be interpreted as the discretization of the Stecklov- 
Poincare operator. From a mechanical point of view, it is the linear operator that 
provides the reaction associated to given interface displacement field. 

If we consider that the subdomain is also loaded on internal degrees of freedom, then 
the condensation of the equilibrium on the interface reads: 

= ft^-Kj^K^r'fl'^ (18) 

(s) 

bp is the condensed effort imposed on the substructure. 

(s) 

Dual Schur complement : the dual Schur complement is a linear operator that 
computes interface displacement field from given interface effort. From equation (jlip 
and (|14p we have: 

where K'^^'^'^ is the generalized inverse of matrix K^^'^ , and where it is assumed that 
no rigid body motion is excited. If we denote by R^'^^ the kernel of matrix K^'^^ this 
last condition reads: 

Il(s)T^(s) ^ Q equivalents = (20) 

Remark 2.6 A generalized inverse (or pseudo-inverse) of matrix M is a matrix, denoted 
M"^ , which verifies the following property: Vy G range(M), MM^y = y. Note that this 
definition leads to non-unique generalized inverse, however all results presented are inde- 
pendent of the choice of generalized inverse. 

Of course in order to take into account, inside (|19|) . the possibility of the substructure to 
have zero energy modes, an arbitrary rigid displacement can be added leading to the next 
expression where vector a^^^ denotes the magnitude of rigid body motions: 

= + 4^)a(^) (21) 

(s) 

Hybrid Schur complement: S^J^ this operator corresponds to an interface where de- 
grees of freedom are partitioned into two subsets. Suppose that the first subset is 

is) 

submitted to given Dirichlet conditions and the second to Neumann condition, S^J^ is 
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the linear operator that associates resulting reaction on the first subset and resulting 
displacement on the second subset to those given conditions. We denote by subscript 
p data defined on the first subset and by subscript d data defined on the second subset 
(schematically b = pL) d and pD d = 0). 



(22) 



(s) 

The computation of operator S^J though no more complex than the computation of 

(s) (s) - 

Sp or , requires more notations. A synthetic option is to denote by subscript p 
the sets of internal (subscript i) data and second-interface-subset (subscript d) data, 
schematically p = iU d. We introduce a modified trace operator: 

= (l 1 = (23) 



Vd 



Then internal equilibrium (jlip reads: 





(24) 



Then hybrid Schur complement is: 



\ ^PP ^PP ''d ^pp ''d / 

(s) 

As can be noticed the diagonal blocks of S^J look like fully primal and fully dual 

Schur complements, while extradiagonal blocks are antisymmetric (assuming IS 
symmetric). Of course if all interface degrees of freedom belong to the same sub- 
set, the hybrid Schur complement equals "classical" fully primal or fully dual Schur 
complement. Moreover it stands out clearly that: 



o{^)^ ^ ^ ''P ^^dd ^^dd ^^dd 126) 

Ud^dd ^P ^dd ^dd^dd ^dd. 



(s) 

S^p is the operator which associates displacement on the first subset and reaction on 
the second subset to given effort on the first subset and given displacement on the 
second subset. 

(s) (s) 

As both matrices Kpp and K^J may not be invertible, only their pseudo-inverse has 
been introduced. The invertibility is strongly dependent on the choice of interface 
subsets. 

2.4-4 Block notations 

While condensed operators simplify the writing of local operations, the block notations 
make it easer to understand the global operations of domain decomposition. We propose to 
denote by superscript the row-block repetition of local vectors and the diagonal-block 
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repetition of matrices, block assembly operators are written in one row (column-block) and 
denoted by special font, for instance: 















= 




1 r = 




A^ = 





















... \ 




AW ... \ 









= 


■• 















. ■•. 


V 




kW J 




[o .. 


. t(^) / 


A = (^(1) 


... ^W) 


A=(A« ... 





Remark 2.7 The specific notation for assembly operators aims at emphasizing at their spe- 
cific role in terms of parallelism for the methods. Moreover, the only operation that requires 
exchange of data between subdomains is the use of non-transposed assembly operators. 



Fundamental system pOj) then reads: 



or in condensed form: 



kfX" = 
Kfu" = 



50 ❖ 7 ❖ I \ <> 



AXl = 

Aul = 

The orthogonal property of assembly operators (j9ap simply reads: 



Relation (I9dll reads: 







AA = diag(multiplicity) 



(27a) 

(27b) 
(27c) 

(28a) 
(28b) 
(28c) 

(29) 
(30) 



Remark 2.8 For improved readability, we will denote by bold font objects defined in a 
unique way on the interface (ie "assembled" quantities). Schematically, assembly operators 
enable to go from block notations to bold notations and transposed assembly operators realize 
the reciprocal operations. 
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2.4-5 Brief review of classical strategies 

We can define general strategies to solve system (p7|) or ([28]): 

Primal approaches [104^ 11051 EH 1102^ [75| 11061 1103j a unique interface displacement un- 
known satisfying equation (j28cp is introduced, then an iterative process enables to 
satisfy ()28bp while always verifying (j28ap . 

Dual approac hes [HI [SU HJl USj [391 ESI [7] a unique interface effort unknown Af, satisfying 
equation (|28bp is introduced, then an iterative process enables to satisfy (|28cp while 
always verifying (j28ap . 

Three fields approaches [IH [HTj [92] a unique interface displacement is introduced, 
then relation (j28cp is dualized so that interface efforts are introduced as Lagrange 
multipliers which yet have to verify relation (|28bp . Then the iterative process looks 
simultaneously for (A^,itf,,u^) verifying exactly equation (j28ap . As this method is 
mostly designed for non-matching discretizations it will not be exposed in the re- 
maining of this paper, anyhow a variant of the dual method which is equivalent to 
the three-field method with conforming grids will be described. 

Mixed approaches [54[l66 [[1001 l99j new interface unknown is introduced which is a linear 
combination of interface displacement and effort, ~ -^fe "I" -^6*^6' then the interface 
system is rewritten in terms of unknown this new system is solved iteratively and 
then and are postprocessed. Of course matrix is an important parameter of 
these methods. 

Hybrid approaches [35| [771 [3^ [57] interface is split into parts where primal, dual or 
mixed approaches are applied, specific recondensation methods may then be applied. 

Many different methods can be deduced from these large strategies, the most common will 
be presented and discussed in section [31 Anyhow since iterative solvers are used to solve 
interface problems, we recommend the reader to refer to appendix [X] where most used 
solvers are presented, including important details about constrained resolutions. 

3 Classical solution strategies to the interface problem 

The aim of this section is to give extended review of classical domain decomposition meth- 
ods, the principle of which has just been exposed. The association with Krylov iterative 
solvers is an important point of these methods, appendix [X] provides a summary of impor- 
tant results and algorithms that are used in this section. 

3.1 Primal domain decomposition method 

The principle of primal domain decomposition method is to write the interface problem 
in terms of one unique unknown interface displacement field ui^. The trace of local dis- 
placement fields then writes = A^u^. Because of the orthogonality between assembly 
operators ([29]), equation (|28cp is automatically verified. Using equation ()28bp to eliminate 
unknown reaction A^ inside (j28ap . we get the primal formulation of the interface problem: 



Operator Sp is the global primal Schur complement of the decomposed structure, it 



Using a direct solver to solve system (jSip implies the exact computation of local contribu- 
tion, the sum of these contributions (in a parallel computing context, this step correspond 




(31) 
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to large data exchange between processors) and the inversion of the global primal Schur 
complement which size is the global geometric interface (the size of which is far from being 
neglectable) and which sparsity is very poor (each interface degree of freedom is connected 
to degrees of freedom belonging to the same subdomains) . Using an iterative solver is much 
less expensive since the only required operations are matrix-vector products which can be 
realized in parallel because of the assembled structure of global primal Schur complement; 
moreover excellent and rather cheap preconditioner exists. Note that if global matrix K 
is symmetric positive definite then so is operator Sp and then popular conjugate gradient 
algorithm can be used to solve the primal interface problem, in other cases solvers like 
GMRes or orthodir have to be employed. 

3.1.1 Preconditioner to the primal interface problem 

An efficient preconditioner S^~^ is an interface operator giving a good approximation of 
the inverse of Sp. Various strategies are possible. For instance, a direct preconditioning 
method is based on the construction of an approximate Schur complement from a simplified 
structure defined by degrees of freedom "near" the interface. Anyhow such a method does 
not respect the repartition of the data through processors. A good parallel preconditioner 
has to minimize data exchange. 

Since operator Sp is the sum of local contributions, the most classical strategy is then 
to define S~^ as a scaled sum of the inverse of local contributions: 

S-^ = AS^+AJ' = AS^A^ (32) 

Since Sp~^ = requires the computation of local problems with given effort on the interface, 

this preconditioner is called the Neumann preconditioner. Scaled assembly operator A can 
be defined the following way [55] : 

A = (AM^A^) AM* (33) 

where is a parameter which enables to take into account the heterogeneity of the 
subdomains connected by the interface. It should make matrix (AAf^A^) easily invertible 
and give a representation of the stiffness of the interface, most commonly: 

• M* = for homogeneous structures, 

• = diag{K^^) for compressible heterogeneous structures, 

• M* = fj,'^ for incompressible heterogeneous structures (//* is the diagonal matrix the 
coefficients of which are the shearing modulus of interface degrees of freedom) . 

The (s) notation makes it easier to understand implementation of scaled assembly op- 
erators: 

5p 1 = M^'^A^'^sl^^A^'^^M^''^ (34) 

s 

• M^*) = diag( ^^^^.p^^.^y ) for homogeneous structures, 

• M^^' = diag( '''' //) ) for compressible heterogeneous structures (assuming i rep- 

resents the same degree of freedom shared by the j subdomains) 



18 



P. Gosselet and C. Rey 



• ') = diag( — '—(jj) for incompressible heterogeneous structures (assuming i repre- 

sents the same degree of freedom shared by the j subdomains) 
The following partition of unity result clearly holds: 

AA^ = It (35) 
^m(^' = It (36) 

s 

3.1.2 Coarse problem 

The use of dual Schur complement is associated to an optimality condition, as said earlier 
vector being multiplied by the pseudo inverse should lie inside the image of S^. Since 
preconditioning is applied to residual r, the optimality condition reads: 

Rfk^r = (37) 

and introducing classical notation G = AI?^, G'^r = 0. Such a condition can then be 
interpreted as an augmented-Krylov algorithm (see section \K7l\ . Once equipped with that 
augmentation problem, the primal Schur complement method is referred to as the "balanced 
domain decomposition" (BDD [741 1102] ). Algorithm 13.11 summarizes the classical BDD 
approach, and figure [8] provides a schematic representation of the first iteration of the 
preconditioned primal approach. 



Algorithm 3.1 Primal Schur complement with conjugate gradient 



1 


Set P = I - G{G'^SpG)-^G^Sp 




2 


Compute uo = G{G'^ SpG)-^G^bp 




3 


Compute rQ = bp — SpUo = P^bp 




4 


zq = Sp^ro set wq = zq 




5 


for j = 0, . . . ,m do 




6 


Pj = SpPwj (notice SpP = P'^Sp = 


P^SpP ) 


7 


aj = {zj,rj)/{pj,Wj) 




8 


Uj+i = Uj + ajWj 




9 


^j+i = - (^jPj 




10 


^j+i = Sp^rj+i 




11 


For ^ z ^ j, (5] = -{zj+i,pi)/{wi, 


Pi) 


12 


Wj + l = Zj + i + Y.l=i P]wi 




13 


end for 





3.1.3 Error estimate 

The reference error estimate is the one linked to the convergence over the complete structure: 
■^^^^IjIP^. Assuming local inversions are exact, we reach the following result: 

\\Ku - /II _ \\SpUi, - hpW 



(38) 

During the iterative process HSpU;, — ^p|| is the norm of residual r as computed line 9 
of algorithm 13.11 so the global convergence can be controlled by the convergence of the 
interface iterative process. 
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local Dirichlet problems 



^ A ^initial residual 



Initial estimate u„=0 



connected non-balanced subdomains 



♦ 



scaling ± splitting 
force gap 



local Neumann problems 



non-connected subdomains 




_ averagmg 
scaling ^ displacement gap 



new residual 




local Dirichlet problems 



New estimate 



Figure 8. 



Representation of first iteration of preconditioned primal approach 



3.1.4 P-FETI method 

The P-FETI method is a variation of BDD proposed by [49 ^ [50] inspired by the dual ap- 
proach (the reader should refer to the dual method before going further inside P-FETI). Its 
principle is to provide another assembly operator which incorporate rigid body elimination 
by a dual-like projector. 

S-^ = mS^M^ (39) 
= - A^QG{G^QGy^ G'^ (40) 

The choice of matrix Q is guided by the same considerations as in the dual method. It 

is worth noting that when Q is chosen equal to the Dirichlet preconditioner of the dual 

_ J' _ 

method (Q = A S^A) then the P-FETI method is equivalent to the classical balanced 
domain decomposition. 

3.2 Dual domain decomposition method 

The principle of dual domain decomposition method is to write the interface problem in 
terms of one unique unknown interface effort field Af,. The trace of local effort fields then 
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writes = A'^A^. Because of the orthogonality between assembly operators (p9|) . equation 
()28bp is automatically verified. In order to eliminate unknown interface displacement field 
using (j28cp . we first obtain it from equation (j28a|) (or equivalently (|27ap ): as seen in ([21]) the 
inversion of local systems may require the use of generalized inverse and the introduction of 
rigid body motions the magnitude of which is denoted by vector a^^\ the use of generalized 
inverse is then submitted to compatibility condition. 

ut = S2{b; + g'X,) + Rta- (41) 
Rf{b;+g'X,) = (42) 

The first line is then premultiplied by A (same expressions could be obtain from non con- 
densed notations). 

G = KRl 
= Rfbl = R^^f 

we get the dual formulation of the interface problem: 




This is the basic dual Schur complement method, also called Finite Element Tearing and 
Interconnecting method (FETI \A\\ I43j). For similar reasons to the primal Schur comple- 
ment method, this system is most often solved using an iterative solver, then we will soon 
discuss the preconditioning issue and how the G^Xh + = constraint is handled. Let us 
first remark that global dual Schur complement is non-definite as soon as redundancies 
appear in the connectivity description of the interface, anyhow it is easy to prove [51] that 
local contributions = A^A^ are unique (non-definition only affect the "artificial" split- 
ting of forces on multiple points), and that because the right hand side lies in range (A) 
iterative process converges; other considerations on the splitting of physical efforts between 
subdomains will lead to improved initialization (see section [3.2.41 and |60]). 

3.2.1 Preconditioner to the dual interface problem 

Like it is done in the primal approach, the most interesting preconditioners are researched 
as assembly of local contributions, and the global dual Schur complement being a sum of 
local contributions, optimal preconditioner is a scaled sum of local inverses. 

Id ^ = kSd^k^ = AS;A^ (44) 

Because this preconditioner uses local primal Schur complement, which corresponds to 
the local resolution of imposed displacement problems, it is commonly called the Dirichlet 
preconditioner. One interesting point is the possibility to give approximation of the local 
Schur complement operator leading to the following preconditioners: 

Sp K^f^ lumped preconditioner (45) 

Sp diag(K^^) superlumped preconditioner (46) 

These preconditioners have very low computational cost (they do not require the compu- 
tation and storage of the inverse of local internal matrices Kf-~^), even if their numerical 
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efficiency is not as strong as the Dirichlet preconditioner, they can lead to very reduced 
computational time. 

Scaled assembly operator A can be defined the following way [65] : 

A = (^AM*"^ A^) ^ AM*"^ (47) 

where is the same parameter as for the primal approach. Such a definition is not that 
easy to implement, an almost equivalent strategy is then used, easily described using the 
(s) notation: 

= J]]m("U(")4")#)^M(^) (48) 

s 

• M^*^ = diag(^^^^jijj^jj^) for homogeneous structures, 

( ^ diafffA'^^^V 

• M}^' = diag( ] ) for compressible heterogeneous structures (assuming i rep- 

resents the same degree of freedom shared by the j subdomains and (r) is the subdo- 
main connected to (s) on degree of freedom i), 

• M}^' = diag( — ^—(jf) for incompressible heterogeneous structures (assuming i repre- 

J2j Mi 

sents the same degree of freedom shared by the j subdomains and (r) is the subdomain 
connected to (s) on degree of freedom i). 

We have the following partition of unity result: 

AA ^ = It (49) 

and the following complementarity between primal and dual scalings |60j : 

A^A + A^A = r (50) 
^W^M(^)a(") + A(")^mWaW = I^M (51) 

3.2.2 Coarse problem 

Admissibility condition G^A^ + = 0, can be handled with an initialization / projection 
algorithm (see section [AH): A;, = Aq + PA* with G^Ag = -e* and G^P = 0. 

Ao = -QG{G^QGy\^ (52) 
P = I-QG{G^QGy^G^ (53) 

The easiest choice for operator Q is the identity matrix, projector P is then orthogonal, 
this choice is well suited to homogeneous structures. For heterogeneous structures, matrix 
Q has to provide information on the stiffness of subdomains, then Q is chosen to be a 

version of the preconditioner leading to "superlumped projector" (Q = A diag(A'^(^)A ), 

"lumped projector" {Q = AK^^^A ) and "Dirichlet projector" {Q = ASpA ). Superlumped 
projector is often a good compromise between numerical efficiency and computational cost. 

Algorithm l3.2l presents a classical implementation of FETI method, and figure [9] provides 
a schematic representation of the first iteration of the preconditioned dual approach. 
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Algorithm 3.2 Dual Schur complement with conjugate gradient 

1: Set P = / - QG{G^QG)-^G^ 

2: Compute Ao = -QG{G^QG)-^e 

3: Compute ro = P^bd — SdXo) 

4: zq = PS^^ro set wq = zq 

5: for j = 0, . . . , m do 

6: pj = P^SdWj 

7: aj = {zj,rj)/{pj,Wj) 

8: Xj+i = Xj + ajWj 

9: rj+i = rj -j^jpj 

10: Zj+i = PSj^rj+i 

11: For ^ i ^ j, /3j = -{zj+i,pi)/{wi,pi) 

12: Wj+i = Zj+i + X;i=i 

13: end for 

14: a* = (G^gG)-iGV^ 

15: = ir*+Am + -R^a* 



5.2.5 Error estimate 

The convergence of the dual domain decomposition method is strongly linked to physical 
considerations. After projection, the residual can be interpreted as the jump of displacement 
between substructures: 



-SdX) 



Au^ = A{u) 



(54) 
(55) 



Anyhow, such an interpretation cannot be connected to the global convergence of the sys- 
tem. In order to evaluate the global convergence, a unique interface displacement field 
has to be defined (most often using a scaled average of local displacement fields) and used 
to evaluate the global residual. When using the Dirichlet preconditioner, it is possible 
to cheaply evaluate that convergence criterion. Average interface displacement Ub can be 
defined as follow: 

Ub = A^A (56) 

then, from equation ([38]) . convergence criterion reads: \\Ku — f\\ = \\ASpA. r\\. So when 
using the Dirichlet preconditioner, the evaluation of the global residual only requires the 
use of a geometric assembly after the local Dirichlet resolution. 

3.2.4 Interpretation and improvement on the initialization 

Let us come back to the original dual system (127p . 



K^u^ -- 



(57) 
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initial residual 

• •! \/ 



Initial estimate Ao=0 



local Neumann problems / 

balanced non-connected subdomains 




♦ 



scalingX splitting 
displ. gap 



local Dirichlet problems 



in 



Non-balanced subdomains 

averaging 



scaling 



reaction gap 




^ " new residual 



local Neimiann problems 



New estimate 
Figure 9. 



Representation of first iteration of preconditioned dual approach 
And suppose this system is being initialized with non zero effort A^qI 







let Afc 










= r+e^'A^x, 


with f 


= r+t^VA,o 



(58) 



;60 



(59) 
(60) 

So initialization Af,Q can be interpreted as modification t^^A^Xi^Q of the intereffort between 
substructures: local problems are defined except for an equilibrated interface effort field; 
the only field that makes mechanical sense (and that is uniquely defined) is the assembly 
of interface efforts. 



Af^f = Aff = fb global interface effort 



because Aff^A^X. 



bo 



Abo = 



(61) 
(62) 



Non-zero initialization then can be interpreted as a repartition of global interface effort 
/(,. Two strategies can be defined in order to realize that splitting. 
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Classical effort splitting Though sphtting is hardly ever interpreted as a specific initial- 
ization, it is commonly realized, based on the difference of stiffness between neighbor- 
ing substructures (that idea is strongly connected to the definition of scaled assembly 
operators): the aim is to guide the stress flow inside the stiffer substructure, sticking 
to what mechanically occurs. 

Global interface effort fb is then split according to the stiffness scaling (M* = 
diag{K^fj)) , which leads to modified local effort 

= M^A{AM^A^y^ fb (63) 

Complete effort is constituted by inside the substructure ((/ — t'^^t^)/'^) and 
split effort on its interface (t^^/^)- 

7o = (/-toV)r+to^7^ (64) 

Because of the complementarity between scaled assembly operators (|5Up . final effort 
reads 

f = f- f^A^ (aM^-^A^) ^ AM^'-h^f (65) 



Interface effort splitting [60] If we start from condensed dual system ([28 



t;"*-)/* — 4- X 

Aul = ^ ' 

condensed efforts can be split along the interface as long as global condensed effort 
remains unique. Assembled condensed interface effort reads bp = Abp, if it is split 
according to the stiffness of the substructures: 

6; = M^A (AM^A^) bp (67) 
We have, using the complementarity between scalings: 

hl = hl- A^ (aM^-^A^) ^ AAr-^bl (68) 
Or in a non-condensed form: 

f = f - e^A^ {^AAr-^A^^ ^ AAr-Hl (69) 

As will be shown in assessments, the classical splitting leads to almost no improvement of the 
method while the condensed splitting can be very efficient for heterogeneous structures. In 
fact the initialization associated to this splitting can be proved to be optimal in a mechanical 
sense; it can also be obtained from the assumptions used for the primal approach. 

Primal approach initialization is realized supposing that interface displacement field is 
zero on the condensed problem; from (j66p we get: 

A%o + ^;=^0 (70) 

which could only be the solution if null interface displacement was the solution. Then local 
interface efforts are split into an equilibrated part and its remaining q^: 



% = A^7 + q'' 
7 = [AD^K^Y AD'^U 



(71) 
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-D^ is a symmetric definite matrix, remaining is orthogonal to range(Z)*A). If the system 
is initiahzed by: 

Afeoo = -7 (72) 

then initial residual A"^A(,qq + 6* = — is minimal in the sense of the norm associated to 
. If = diag(i^6;,*)~^ then initialization is equivalent to the splitting of condensed 

efforts according to the stiffness of substructures ; diag(ir6;,^) being an approximation of 

Sp^ that norm can be interpreted as an energy. 

The initialization by the splitting of condensed efforts has to be made compatible with 

solid body motions by the computation of: 

Abo = ^600 - QG {G^QGY' e<> (73) 

Remark 3.1 If = was not computationally too expensive then improved initializa- 
tion with Dirichlet preconditioner would lead to immediate convergence. 

Remark 3.2 The recommended choice = diag(/Cfefe^)~^ is computationally very cheap, 
the heaviest operation is the computation of condensed efforts ( one application of Dirichlet 
operator). Then if the Dirichlet preconditioner is used, new initialization is just as expensive 
as one preconditioning step but it can lead to significant reduction of iterations, so it should 
be employed. Of course if light preconditioner is preferred classical splitting should be used. 

3.3 Three fields method / A-FETI method 

The A-FETI method [82, 81j can be explained as the application of the three-field strategy 
to conforming grids, this method is widely studied in [92j. Back to (j28p . we have 

s;ut = b; + xt (74) 

AA^ = (75) 
Aul = (76) 

A-FETI method is based, like in the primal approach, on the introduction of unknown 
interface displacement field Uf^, the continuity of displacement then reads: 

ul = A^Ub (77) 

but local displacements are not eliminated like in the primal approach, complete system 
reads: 

-/ 




f 








\-\ 




Ubj 




^0 



A^ 1 I A^ = I I (78) 
A 

In order to eliminate interface displacement ui, a specific symmetric projector is introduced: 

B = I - A^ {AA^y^ A (79) 

B realizes the orthogonal projection on ker(A) {AB = 0). Since AA^ = then A^ can be 
written as 

K = Bfil (80) 

ji^ is a new interface effort, corresponding (recall AA^ = diag(multiplicity)) to an average of 
original A^. Introducing last result and using B^ A?" = to eliminate interface displacement 
we have 

% -o1 (;|) - © 
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Then using classical elimination of local displacement by the inversion of the first line of 
the previous system, we get 

ut = S;+{b; + Bfit)+Rta- (82) 
Rf{b; + Bf,t) = (83) 



which leads to 



Rf B (a'; " I -Rf b'^ 



p 1 (84) 



This system is very similar to the classical dual approach system, and in consequence 
is solved in the same way (using projected algorithm). Anyhow the main difference is 
that Lagrange multiplier fi^ is defined locally on each subdomain and not globally on the 
interface. 

It was proved in [92] that A-FETI is mathematically equivalent to classical FETI with 
special choice of the Q matrix parameter of the rigid body motion projector. In fact if 

Q = diag( ^^^^jp^^^^.y ) then FETI leads to the same iterates as A-FETI. Moreover operator 
B is an orthonormal projector which realizes the interface equilibrium of local reactions fi"^, 
it can be analyzed as an orthonormal assembly operator as described in figure [71 

To sum up, A-FETI can be viewed as the conforming grid version of the three-field 
approach, a specific case of classical FETI, and a dual approach with non-redundant de- 
scription of the connectivity interface with orthonormal assembly operator. 

3.4 Mixed domain decomposition method 

Mixed approaches offer a rich framework for domain decomposition methods. It enables 
to give a strong mechanical sense to the method, mostly by providing a behavior to the 
interface. The mixed approach is one of the bases of the LaTIn method [USl [HZl [SH [SS] , a 
very interesting strategy designed for nonlinear analysis; as we have restrained our paper to 
linearized problems, we do not go further inside this method which would deserve extended 
survey. Several studies were realized on mixed approaches, these methods possess strong 
similarities, we here mostly refer to works on so-called "FETI-2-fields" method [100^ I99|. 
The principle of the method is to rewrite the interface conditions: 



AA^ = (85) 
Aul = 



in terms of a new local interface unknown, which is a linear combination of interface effort 
and displacement. 

^,t = Xt + T^ut (86) 

is homogeneous to an effort and can be interpreted as an interface stiffness. Mixed 
methods thus enable to give a mechanical behavior to the interface, in our case (perfect 
interfaces) it can be mechanically interpreted as the insertion of springs to connect sub- 
structures. New interface condition reads: 

A^AXl + T^A^Aut = 0* (87) 
or A^A//^ - (A^AT^* - T^^A^A) < = 0* (88) 

It is of course necessary to study the condition for this system being equivalent to system 
(|85p . It is important to note that two conditions lying on the global interfaces (geometric 
and connectivity) were traced back to the local interfaces, so up to a zero- measure set 
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(multiple points) the conditions have the same dimension. The new condition is equivalent 
to the former if facing local interfaces do not hold the same information which is the case 
if matrix [A'^AT^ - T^A^A) is invertible. An easy method to construct such matrices will 
be soon discussed 

If unknown /j,^ is introduced inside local equilibrium equation, the local system reads: 

{s; + T,-)ut = b; + fii (89) 

If we assume that is chosen so that (S'p + T^) is invertible then we have: 

ut = {s; + n)-'{f,t + b;) (90) 

Then substituting this expression inside interface condition (jSSp . interface system reads: 



(a^a - (A^Ar,o - r,OA^A) {s; + T^y'j f^t 

= {A^AT^ - T^A^A) {S; + T^*) (91) 

so mixed approaches have the originality to rewrite global interface conditions on the local 
interfaces and to look for purely local unknown (which means that the size of the unknown 
is about twice the size of the unknown in classical primal or dual methods). 

This general scheme for mixed methods has, as far as we know, never been employed. 
A first reason is that it leads to certain programming complexity, second the manipulation 
of zero-measure interfaces is not easy for methods aiming at introducing strong mechanical 
sense and hard to justify from a mathematical point of view. So most often a simplified 
method is preferred, which takes only into account non-zero-measure interfaces. Such an 
approach simplifies the connectivity description of the interface, every relationship on the 
interface only deals with couples of subdomains. In order to have the clearer expression 
possible, we present the algorithm in the two subdomains case. Interface equilibrium reads: 



which is equivalent to 



4" - = 

A<" + Af'=0 



AP' + Af>+r<^> uf-„<" =0 



(92) 



(93) 



under the condition of invertibility of (T*^-*^^ -I-T'^^-'). Introducing unknown fi^^ = X^^ + 
interface system reads: 



^w + ^(2)_(r(i)+r(2))na)=o ^ ^ 



Local equilibrium reads: 




(1) _ ,,(1) , 

(95) 



5f +r(2))nf)=^f)+6f 
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Assuming T^^^ is chosen so that matrix \^Sp + T^^^ j is invertible, we can express displace- 

(s) 

ment from local equilibrium equation, and suppress it from global interface conditions, 
which leads to: 

J j_(T(i)+r(2))(42)+T(2))"'\ / _ 

(r(i) + r(2)) (^sP + r(i)) / / U^'V " 

(r(i)+r(2)) (5i') + r(2))''6f \ 
(T(i)+T(2)) (^5i')+r«)"'6(^7 ^^^^ 

This expression enables to give better interpretation of the stiffness parameters T^*^. Sup- 
pose T(^) = Sp^^ and T^^^ = Sp^^ then matrix ([96j) is equal to identity and solution is directly 
achieved. So the aim of matrix T^^^ is to provide one substructure with the interface stiffness 
information of the other substructures. 

If we generalize to A^-subdomain system (|9ip . we can deduce that the optimal choice 
for T^'^^ is the Schur complement of the remaining substructures on the interface of domain 

(s) _ 

(s) (some kind of Sp where s denotes all the substructures but s). Of course such a 
choice is not computationally feasible (mostly because it does not respect the localization 
of data), and approximations have to be considered. In decreasing numerical efficiency and 
computational cost order, we have: 

• Approximate the Schur complement of the remaining of the substructure by the Schur 
complement of the neighbor; 

• approximate the Schur complement of the neighbor by the Schur complement of the 
nearer nodes of the neighbor (" strip" -preconditioners which idea is developed in |83] 
in another context); 

• approximate the Schur complement of the neighbor by the stiffness matrix of the 
interface of the neighbor (strategy of dual approach lumped preconditioner). 

The second strategy is quite a good compromise: it respects data localization, it is not 
computationally too expensive and yet it enables the propagation of the information beyond 
the interface. Of course an important parameter is the definition of elements "near the 
interface" , which can be realized giving an integer n representing the number of layers of 
elements over the interface. 

3.4-1 Coarse problem 

Because the interface stiffness parameter T'^ regularizes local operators Sp, local operator 
(Sp + T^) is always invertible. Such a property can be viewed as an advantage because it 
simplifies the implementation of the method introducing no kernel and generalized inverse; 
but it also can be considered as a disadvantage because no more coarse problem enables 
global transmission of data among the structure. Then the communications inducted by 
this method are always neighbor-to-neighbor which means that the transmission of a lo- 
calized perturbation to far substructures is always a slow process. It is then necessary to 
add an optional coarse problem (see section IA.7P . Most often the optional coarse problem 
is constituted of would-be rigid body motions (if subdomains had not been regularized). 
Another possibility, which is proposed inside the LaTIn method is to use rigid body mo- 
tions and extension modes of each interface as coarse problems, this leads to much larger 
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coarse space. The coarse matrix corresponds to the virtual works of first order of defor- 
mation of substructures; so mechanically it reahzes and propagates a numerical first order 
homogenization of the substructures. 

3.5 Hybrid approach 

The hybrid approach (see [S7j for a specific apphcation) is a proposition to provide a 
unifying scheme for primal and dual approaches though it could easily be extended to other 
strategies. It relies on the choice for each interface degree of freedom of its own treatment 
(for now primal or dual). So let us define two subsets of interface degrees of freedom: the 
first is submitted to primal conditions (subscript p) and the second to Neumann conditions 
(subscript d). Local equilibrium then reads (p = iU d, b = dU p, pCi d = 

Kpp ^pp\ ('^t\ ^ ( /A j_ (^'d^Ki\ (97) 
Kpp K^pJ \UpJ \fp J \ K J 

Preferred interface unknowns are unique displacement on the first subset Up and unique 
effort on the second subset A^. Local contributions then reads: 

ul = A^Up (98) 
AS = AjA, (99) 

which ensure the continuity of displacement on the p degrees of freedom and the action- 
reaction principle on the d degrees of freedom, of course operators Ap and have been 
restricted respectfully to the p and d subsets. Remaining interface conditions read: 

A,txS = A,t>^ = (100) 
ApXl = (101) 

To obtain the global interface system, first local unknown Up has to be eliminated: 

= + (/? + tf - K^puf) + Rla^ (102) 

Applying continuity condition to previous result and equilibrium condition to the second 
row of (j97|) . interface system reads: 






(Up] 




( bp\ 




Ad 




-h] 








{-eo 



(103) 



with the following notations: 



_ (Ap \ ^ (Ap 
"^^^"VO Ad) ^"[O A, 
Gp = ApKpp Rp, = A^t^Rp 

^ ^^p J p 

bp = Ap [fp — KppKpp^ fp) , b^ = A^t'^Kpp^ 



This interface problem corresponds to the constrained resolution of one linear system. The 
constraint is linked to the possible non-invertibility of matrix Kpp and thus to the choice 
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of primal subset. Notice that Gp represents the reaction of primal degrees of freedom to 
zero energy modes of Kpp and then should be zero in most cases (it may be non zero 
in buckling cases). Moreover this system may represent classical primal approach (if all 
interface degrees of freedom are in subset p) or classical dual approach (if all interface 
degrees of freedom are in subset d). Operator Spd is a primal/dual Schur complement, it is 
the sum of local contributions S^^ (f25l) . 

The above system is nonsymmetric semi-definite (because of redundancies on the dual 
subset) positive, it has to be solved by GMRes-like algorithm. 

3.5.1 Hybrid preconditioner 

Inspired by primal and dual preconditioners, we propose to approximate the inverse of the 
sum of local contributions by a scaled sum of local inverses. 

-1 _ \ fAp Y 



Scaled assembly operator are defined in the same way as in primal and dual approaches. 
3.5.2 Coarse problems 

As said earlier, depending on the choice of subset p, local operator Kpp (involved in the 
computation of S^^) may not be invertible and, like in dual approach, a first coarse cor- 
rection has been incorporated inside the hybrid formulation. Anyhow local operator K^g 
involved in preconditioning step may also not be invertible and, like in primal approach, 
a second coarse problem has to be added to make the preconditioner optimal. Then, the 
optimal version of the hybrid system incorporates two coarse problems handled by specific 
initialization/projection algorithm. The admissibility condition reads: 

G^x = e (105) 

with G = ' ^ ~ (^A^) ^'^'^ ^ ~ i^—h ) ■ If = ^ ~ SpdX stands for the residual 

before preconditioning, the optimality condition reads: 



with Hp = AptpR^ and = A^K'^jR^ (as said before most often iJ^ = 0). To sum up 
constraints: 

H^SpdX = H^b 

Handling such constraints is described in section IA.8I 

Figure [TOl provides a schematic representation of the first iteration of the preconditioned 
hybrid approach, in the specific case of a nodal partition of the interface. Assessments will 
deal with partition at the degree of freedom level. 

3.5.3 Error estimate 

Because GMRes-like solver is used, the euclidian norm of the residual is directly available, 
such a norm is the sum of displacement gap on the d part of the interface and an effort 
gap on the p part of the interface. For now, no other estimator with better physical sense 
is available. 
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Figure 10. 



Representation of first iteration of preconditioned hybrid approach 



4 Adding optional constraints 

The aim of optional constraints is to ensure that the research space for the iterative solver 
possesses a certain regularity. The choice of constraints will be discussed in a later section. 
Going back to the interface system: 



(108) 



any constraint of the form Cj Kul = or C"^AA^ = is trivially verified by the solution 
fields, it is just a restriction of continuity/equilibrium conditions. From an iterative process 
point of view, these conditions will be reached once converged ; the principle of optional 
constraints is to have every iteration verify that condition. 

There are two classical solutions to ensure these optional constraints: either to realize 
a splitting of research space and ensure, using a projector, that the resolution is limited 
to convenient subspace, or to realize a condensation of constraints and make iterations 
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on a smaller space. In other words suppose there are ric independent constraints in a n- 
dimension space the first strategy researches n-sized solution in a (n — nc)-ranked space, 
while the second solution researches (n— nc)-sized solution in a (n— nc)-dimension space then 
deduce the n-sized solution. From a numerical point of view both solutions are equivalent, 
they are just two ways of handling the same constraints, anyhow from implementation and 
computational points of view they have strong differences. 

We will essentially focus on application to primal and dual domain decomposition meth- 
ods. 

4.1 Augmentation strategy 

For this strategy the constraint is reinterpreted in terms of constraint on the residual. 
Typically, the primal approach can be augmented by constraints on the effort field while 
the dual approach can have constraints on the displacement field: 

C^AXl = -C^ {Abl-AS^A^Ub) (109) 
C^Aul = {AS^A^>^ + ^) (110) 

The constraint is then handled as an augmentation inside the iterative solver, its is classi- 
cally realized using a projector (see sections lA.71 and lA.SP . 

4.2 Recondensation strategy 

This strategy was recently introduced in the framework of the dual approach, leading to the 
FETIDP algorithm [341 \35\ l69| [63] . Because for now only constraints on the field have 
been considered we will restrain to this kind of constraints, the application of constraints 
on is straightforward. 

4-2.1 Basic method 

We first consider constraints which impose continuity of specific degrees of freedom, in other 
terms we suppose matrix C_ is identity on certain degrees of freedom and zero elsewhere; 
we will show how any constraint can be rewritten in such a form. Because these degrees 
of freedom will be submitted to a primal treatment we will denote them with subscript p 
while the remaining of the interface will be denoted with subscript d. Constraint reads: 

= C-A<=(»)"(|^|)=A,„; (111) 

Like in the hybrid approach this constraint is verified using a unique displacement field 
on the primal part of the interface: Up = ApUp. Interface system then reads like in the 
hybrid approach, with the additional assumption that the constraints are so that the local 
problem possesses enough Dirichlet conditions to make it invertible. 

S„,. ('tA = ( ^l) (112) 



Introducing following notations for blocks composing 5"^^: 



then 



1 1 
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Unknown Up is condensed on the remaining of the interface: 

■"p = ^pp i^P - ^pd>^d) (115) 
Sd^d= {Sdd + SdpSppSpd) = -kd + ^dpSpphp (116) 

The latest equation is solved using an iterative solver, operator has the same properties as 
the restriction of the dual operator to the d-part of the interface (semi-definition, symmetry 
and positivity). Operator is the assembly of local contributions 

Sd = AdSdAS = Arf (s^dd + 4p^pSpp^ps;d) 4l (117) 

Using operator requires the computation of the inverse of matrix Spp = ApSppA^, which 
is an assembled matrix. Then this formulation includes a global coarse problem set on 
primal variables. 

The recommended preconditioner for such an approach is directly inspired by the dual 
approach: it consists in solving local Dirichlet problems with scaled imposed displacement 
on the d-part of the interface and null displacement on the p part of the interface and 
extracting the average reaction of the d-part of the interface. Then the preconditioner 
reads: 

ld^ = iOp k)S;{Op kf (118) 

Figure [TT] provides schematic representation of the first iteration of preconditioned 
FETID method. 

4-2.2 More complex constraints 

We now consider constraints which are not limited to one degree of freedom, for instance 
one can consider that we want to ensure that the average jump of displacement on one edge 
is equal to zero, which involves all the degrees of freedom of the edge. 

The classical solution [63] is to realize a change of basis of degrees of freedom (denoted 
by matrix T^) so that each constraint is represented by one "modified" degree of freedom. 
The change of basis is the same local operation realized on every subdomain, then we can 
define a global change T so that AT* = TA 

C^Aut = C^AT%t = C^TAul = C^Aut (119) 

withC = (J^ (120) 

After the change of basis is realized the same algorithm can be employed. Because con- 
straints most often respect a certain locality of data (for instance independent constraints 
on each edge), change of basis is not a too expensive operation, and does not make too 
poor the sparsity of local matrices. 

4.3 Adding "constraints" to the preconditioner 

This subsection deals with the dualization of the recondensation strategies |19[ [23| . For 
instance the balanced domain decomposition with constraints (BDDC) is a primal version 
of FETIDP: during the preconditioning step, the continuity of displacement is ensured at 
specific degrees of freedom (which can be the result of a local change of basis), so that 
the local Neumann operator remains fully invertible. So solving classical primal approach 
problem 

SpUb = AS^A^Ub = bp (121) 



34 



P. Gosselet and C. Rey 



initial residual 



local recondensed problems 



Initial estimate An=0 




Non-balanced subdomains 
averaging 



scaling 



reaction gap 




balanced non-connected subdomains 



local Dirichlet problems 



local recondensed problems 



New estimate 



I 



scaling I splitting 
displ. gap 




new residual 




Figure 11. 



Representation of the first iteration of preconditioned FETIDP 



the preconditioner reads 

Figure [11] provides a schematic representation of the first iteration of preconditioned 
BDDC method. 

5 Classical issues 

In this section we try to provide answers to questions that commonly arise when using 
domain decomposition methods. 

5.1 Rigid body motion detection 

Handling floating substructures is definitely a very accurate issue. This difficulty is one 
of the reason of the success of methods which regularize the stiffness of subdomains like 
FETIDP or mixed approaches, leading to fully invertible matrices. Anyhow basic primal 
and dual methods remain competitive (mostly because zero-energy modes provide a very 
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Representation of the first iteration of preconditioned BDDC 



natural coarse problem) , hence providing an efficient algorithm for the computation of rigid 
body motion is essential. Many strategies can be used \33\ HQ] , and this review does not 
claim to be exhaustive. 

First we have to discuss the exact composition of the possible kernel of substructures. 
What can be found is: 

• rigid body motions of floating substructures, 

• internal mechanisms of substructures, 

• weird things due to nonlinearities (buckling, exotic behaviors, ...) 

• numerical zero-energy modes. 

An internal mechanism can for instance occur when a substructure is made out of two parts 
connected by one linear edge (pivot) or one singular point (kneecap). Methods exist to 
avoid such substructures either inside the decomposition algorithm or as external programs 
regularizing a given decomposition |48j, and then should be employed. 
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The last two kinds of kernel are non-standard and can only be detected using fully 
algebraic methods (like Gauss pivoting). The problem with algebraic methods is their 
high sensitivity to the condition number of the stiffness matrix of substructures. The 
condition number can be influenced by the aspect ratio and the material composition of 
the substructures, even after adimensionalization the quality of the methods is hard to 
warranty. 

Finally we only develop here two strategies to handle zero-energy modes. The first one 
belongs to the purely algebraic methods, it is very simple to implement and can lead to 
satisfying results for not-too-complex problems, it can handle more than solid body motions 
but it is strongly dependent on a priori selected degrees of freedom. The second method is 
purely geometric, it is very robust but only suited to detect solid body motions. 

In order to simplify notations, we consider the research of the zero-energy modes of 
matrix K (which should be a local stiffness matrix). 

5.1.1 Simple algebraic approach 

This method is based on fundamental relationship between the kernel of a matrix and the 
kernel of Schur complement (jlSp . The principle is to preselect a small set of degrees of 
freedom which we will denote by subscript N (the other degrees of freedom are denoted 
with subscript O). Then compute explicitly primal Schur complement associated to these 
degrees of freedom: S = K^n — KnoKqqKqn)- If A^-degrees of freedom are selected so 
that Kqq is invertible (if only solid body motions have to be detected then it is sufficient to 
take the degrees of freedom associated to three non-aligned nodes) then S is well defined 
and its kernel is linked to the kernel of matrix K. 

Since is a "small" set (12 degrees of freedom is often sufficient), computing the kernel 
of matrix S using "exact" algorithm like singular value decomposition is rather cheap and 
then kernel of matrix K can be computed using equation (jl5|) . 

5.1.2 Geometric approach 

The basic idea of this method is that kinematically admissible solid body motions can 
be deduced from boundary conditions imposed on one subdomain. Let be a basis of 
candidate rigid body motions (would be solid body motions if no Dirichlet conditions were 
applied on the subdomain, is a 6-column matrix in 3D and 3-column matrix in 2D), 
and let E be the matrix of Dirichlet boundary conditions: each column of E represents one 
(combination of) blocked degree of freedom. Kinematically admissible rigid body motions 
R are linear combinations of candidate rigid body motions (hence R = RcQ) which do 
not make Dirichlet boundary conditions work {ie E^R = 0). In order to find such linear 
combination, we compute singular value decomposition of matrix E^Rc = UDV^ and set 
Q = Vq where Vq is the submatrix of V associated to negligible singular values. Because 
is a matrix only made out of geometric considerations, it is well conditioned and the 
criterion to detect "zero" singular values is well defined (there is a large gap between zero 
and non-zero singular values). 

5.1.3 Generalized inverse 

The two methods presented above led to r-ranked basis R of the kernel of matrix K. To 
compute generalized inverse , the most classical way is to select r degrees of freedom, 
so that if they were added Dirichlet conditions, rigidity matrix would be well defined. For 

instance after pivoting and renumber one could get R = [ j I and then the r last degrees 
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of freedom would suit. Then we have: 

This is just one instance of generahzed inverse, other can be built using penalization or 
other modification to matrix K. Though not theoretically prohibited, choosing "blocked" 
degrees of freedom on the interface is often a bad idea from a practical point of view. 

5.2 Choice of optional constraints 

As seen in section HI constraints can be imposed either using augmentation (using one 
or two projectors, sections I A. 71 and lA.Sp or using recondensation algorithms. In the case 
of recondensation algorithms, constraints have to be sufficient in order to suppress rigid 
body motions and then regularize the local stiffness matrix. In the case of augmentation 
algorithm, constraints have to be independent from rigid body motions which are already 
handled by the formulation. 

Because constraints are expensive to handle, they have to be chosen with care; anyhow, 
except in a few cases, there are no general results on how to choose them. Mechanical 
comprehension of studied phenomena and anticipation of convergence difficulties may lead 
to efficient strategies. In the case of solving several linear systems (even with different left 
hand sides) interesting strategies exist [98l [301 EHl EH EH] • 

The next two subsections deal with very common strategies, while the last subsection 
describes another framework for constraints inspired by the LaTIn method [661 [80] . 

5.2.1 Forth order elasticity 

As plate and shell models are often used in structural mechanics, forth order problems 
have been carefully studied [38l [321 [Ml \W3\ [IQ] . Such problems are characterized by the 
appearance of singularities at the corner of substructures (so-called " corner modes" ) which 
are destroying the scalability of usual methods. The classical solution consists in enforcing 
the continuity of the (most often only normal) displacement field at the corners in order to 
regularize the problem. From a practical point of view, corners are most often defined as 
multiple points (nodes shared by more than two substructures), that set can be enriched 
by extremities of edges. 

The implementation, just like the singularity, is strongly linked to the chosen formula- 
tion. 

Dual approach: since the projected residual corresponds to the displacement jump be- 
tween substructures ([M]) . one just has to use augmentation algorithm with one con- 
straint for each pair of neighbor at each corner. Matrix C is then made out of columns 
with one coefficient 1 on one corner degree of freedom and elsewhere. Because the 
dual description of a m-multiple point leads to (m — 1) relationships, such a coarse 
space is rather large. 

Primal approach: In order to regularize the displacement field, the constraints have to 
be imposed on the preconditioned residual (assuming Neumann-Neumann precondi- 
tioner is employed) . The aim is then to have the local contributions of preconditioned 
residual equal to zero on corner points. So if denotes the local interface matrix 
made out of columns with one coefficient 1 on one corner degree of freedom and 
elsewhere, and the same matrix scaled according to the scaling used inside the pre- 
conditioner, the constraints read C = AS^C^. Then a m-multiple degree of freedom 
leads to m constraints. 
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Recondensed approaches: FETIDP or BDDC were first designed in this context, from 
the consideration that (extended) corners constraints were sufficient to suppress rigid 
body motions, then the first level constraints could be avoided. So the methods 
directly apply since they consist in constraining the displacement field. Here whatever 
the multiplicity of a corner may be, it always leads to just 1 constraint. 

5.2.2 Second order elasticity 

Because classical methods are already scalable in the frame of second order elasticity, op- 
tional constraints are not often used in such a context. Furthermore, it is hard to predict 
what constraint should be imposed. In some cases, efficient strategies have been proposed, 
such as in [58] for nonlinear problems using Newton-Raphson solver where approximations 
of eigen vectors are used. 

The question of optional constraints arose when willing to extend recondensed algo- 
rithms (FETIDP and BDDC) to such problems, mostly because the previous definition 
of corners lead to significant problems in 3D (too many constraints, poor convergence...). 
The first solution was proposed in [69j, the idea was to select 3 non-aligned nodes on each 
face (interface between 2 subdomains) which maximized the surface of the triangle they 
defined. The current solution, the scalability of which is mathematically and numerically 
proved, is to enforce average convergence on edges [6^, which is realized by a change of 
basis described in (|119p . In order to take into account heterogeneity on the interface, the 
average may be scaled by a coefficient representing the stiffness of the subdomains. For 
more difficult problems, first order moments of edges can also be added. 

5.2.3 Link with homogenization theory 

This paragraph intends to present a mechanical vision of optional constraint which, though 
hard to implement in the framework of the presented method, may lead to better under- 
standing of what optional constraints and associated coarse problems can provide to the 
methods. This analysis is inspired by the multiscale version of the LaTIn method [66]. 

The underlying question when choosing optional constraints (except from specific nu- 
merical questions like in the forth order elasticity) is "what global information should be 
transmitted to the whole structure ?" or more precisely " what should far substructures know 
from one substructure". A meaningful answer is provided by Saint- Venant principle and 
homogenization theory: at a first order development, a substructure can be represented by 
its rigid body motions and its constant strain states (simple traction and shearing states). 
Such an idea adds six (3 in 2D) more constraints per subdomains; as these constraints are 
somehow complex to build, they can be approximated by interface modes (but of course 
the number of constraints then grows quickly). 

5.3 Linear Multiple Points Constrains 

Multiple points constraints are relationships defined between some degrees of freedom, they 
are often used in order to connect nonconforming meshes, to represent boundary conditions 
(for instance periodicity), to model contact or apply control laws. In the case of linear (ized) 
constraints we can write, on the whole structure scale: 



What seems most suited to the domain decomposition context is to dualize the constraint 
and introduce Lagrange multiplier /i in order to enforce the condition. System then reads: 



Ku = f 
Cu = a 



(124) 
(125) 




(126) 
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After decomposition we have 
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(127) 



with C so that Cu* = Cu = a which implies (since = hJ' 



Ub) 



Cu = {Ci Cb) 



then 



a, 



(128) 
(129) 



Or in other words, if matrix C deals with interface degrees of freedom, the associated 
constraints have to be correctly distributed between sharing subdomains. The constraint 
can be interpreted as specific (non-boolean) assembly operator which explains the chosen 
notation. Using MFCs with domain decomposition methods was studied in [93] in the dual 
method context. 

In order to provide general methodology to apply MFCs, we then incorporate inside 
hybrid domain decomposition method (equations (I97p to ()100p ): 
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(130) 
(131) 



(132) 
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(133) 
(134) 
(135) 
(136) 
(137) 



Various strategies can be used in order to solve system (jl32p . which combine elimina- 
tion of constraints (rigid body motions and/or MFCs) by projection methods (FETI-like 
approaches) and/or by recondensation methods (FETIDF like approaches). All these meth- 
ods correspond to solving rather complex coarse problems but they are suited to traditional 
preconditioners. We propose, after [93j, to use classical projection method to handle rigid 
body motions then use iterative solver to find simultaneously (i^p, A^,/i) and provide effi- 
cient preconditioner to this problem. 
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System reads with trivial notations (for simplicity reasons we suppose that the rigid 
body motion constraints have been symmetrized, which is always possible and which is 
naturally the case if Gp = like in many applications): 

V ^] = (t) (138) 
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(139) 



As can be seen, MPCs have very different actions wether they are set on primal interface 
degrees of freedom or not: matrix Cp modifies the structure of the hybrid system while 
dual and internal constraints lead to classical hybrid approach with modified dual trace 

f Ajf'\ 

and assembly operator A = ("^."^ )■ The definition of efficient preconditioner inspired 

by classical methods is then much simplified if no constraints are set on primal degrees of 
freedom {Cp = 0), which is what we suppose now: 

A„ 0\ f Ar, 0^ 



Spd_,= \J :^jS;,\J (140) 

The notation is due to the association of the trace operator with the assembly operator, 
which in fact is equivalent to defining "extended interface dual degrees of freedom" made 
out of dual degrees of freedom and degrees of freedom involved in MPCs. Since, in this 
hypothesis, system reads like classical hybrid approach, that is (modified) assembly of local 
contributions, the proposed preconditioner is a scaled assembly of local inverses. 

0\ fK, 0' ^ 



The primal scaled assembly operator can be directly imported from the primal approach. 
Concerning the dual approach, according to previous definitions, we have: 

A = [AMf^A^) ^ ^*M|-^ (142) 

Where M? is a diagonal matrix chosen like in the classical methods. The matrix to inverse 
reads: 

[^AMp )-[cpMI-Hf^ CpM^'cTj ^^^^^ 

The idea is then to make this system easy to inverse, having the off-diagonal blocks equal 
to zero. We have Cp = (C^ C,^) with C^A^ = Ca; if we choose 

Cd = Cd (A,i^M2tf Aj) A^m^tf (144) 

then A^t^Mp~^Cp = and the scaled assembly operator is non expensive to compute. 

In others words, one simply has to split constraints on interface degrees of freedom 
between subdomains according to the scaling used inside the preconditioner. 
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5.4 Choice of decomposition 

Decomposing a given structure in order to use the algorithms presented in this paper is 
a complex problem. Algorithms and softwares were proposed [44:\ [T2l I62j . which mostly 
refer to graph theory. Such approaches enable to take into account load balance between 
processors (supposing each processor is assigned to one subdomain, this is equivalent to 
making each local problem as easy to solve as others) and to minimize the dimension of the 
interface (so that the global condensed problem is as small as possible). They also enable 
to avoid internal rigid body motions (so called mechanisms). 

Anyhow another important point is to have local problems as well conditioned as pos- 
sible, so having subdomains with good aspect ratio (ratio between largest and smallest 
characteristical dimensions of the subdomain) is considered as an important point. Any- 
how one has to realize that good aspect ratio is often linked to large local bandwidth and 
then to somehow expensive local problems to solve. 

An even more difficult point to take into account high heterogeneities (see figure [13]) : 
using stiff'ness scaling enables to correctly handle heterogeneity when interface between 
subdomains matches interface between materials, anyhow when interface between subdo- 
mains "crosses" interface between materials then numerical difficulties may occur. For now 
scaled-average optional constraints seem to be the best solution to handle these difficulties 
but it leads to large coarse problems. 



T I 

T 



I I I T 

(a) Interface avoiding (b) Interface match- (c) Interface crossing 
heterogeneity ing heterogeneity heterogeneity 

Figure 13. 

Different kinds of heterogeneity in domain decomposition context 

Finally finding the best decomposition is still a rather open problem and mechanical 
sense is often a necessary complement to efficient automatic decomposing algorithm. 



5.5 Extensions 

5.5.1 Nonsymmetric problems 

Nonsymmetry occurs in many physical modeling: plasticity, nonlocal models for fracture 
|52j . frictional contact [3]. The use of the domain decomposition methods presented in this 
paper just requires more care in the implementation because some simplifications are not 
available (for instance coarse problem matrices are nonsymmetric), and of course the use 
of well suited iterative solver like GMRes, OrthoDir or BiCG because Schur complements 
are no longer symmetric. 

Globally methods show good numerical performance results. Anyhow a real problem 
is the absence of theoretical results to ensure good convergence properties (this is mainly 
due to the fact that proofs for classical methods rely on the construction of an interface 
inner-product related to Schur complements which is no longer possible). 
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5.5.2 Nonlinear problems 

For now we have considered the solution to linear systems. To adapt the method to nonlin- 
ear problems, a classical solution is to use Newton-Raphson linearization scheme: linearized 
stiffness matrices are computed independently on each subdomain then the linearized sys- 
tem is solved using domain decomposition [85l [581 [881 SQl [59] . Foi' such approaches, domain 
decomposition methods can be seen as efficient black-boxed linear solvers. 

One critical point in such a method is that, depending on the formulation (for instance 
fully Lagragian formulation of a large deformation elasticity problem) rigid body motions 
may vary from one system to the other. In the proposed context, what has been proved 
is that translations always belong to zero energy modes, what has been observed is that 
rotations only appeared as zero-energy-modes in the first system (which corresponds to 
linear elasticity problem). This might be penalizing because the size of the information 
transmitted inside the coarse problem is decreased after the first system; moreover, rota- 
tions are often converted to "negative-energy modes" which, if they are in small number, 
can be handled by fully-reorthogonalized conjugate gradient (though convergence will be 
slower). One classical solution is to reinject disappeared rotations as optional constraints 
(via augmentation algorithms). 

In the case where nonlinearity is localized in few substructures, an interesting strategy 
can be to carry out subiterations of the nonlinear solver independently in those substruc- 
tures [I3l[l8]. 

5.6 Implementation issues 

Implementing domain decomposition methods from existing code is not a too complex 
task. We give a few details on our software architecture though practical solutions are 
far from being unique. Our code is a plug-in to ZeBuLoN object-oriented finite element 
computational software |79i I78j . it takes advantage of Frederic Feyel's previous work \Ab\ 
I46j . Our implementation aims at being as generic as possible, so for now hybrid domain 
decomposition method has been developed (and mixed approaches are under construction), 
and separation between formulation and solver (so that any iterative solver can be used to 
solve the interface problem). All classical projectors and preconditioners are implemented. 
The most basic pieces of the code are: 

• topological description of the interface, ie ability to realize trace operations; 

• exchange library (PVM, MPI), ie ability to realize assembly operations; 

• classical FE code, ie ability on any given subdomain to get any local field given 
sufficient boundary conditions. 

5. 6. 1 Organization of the topological information 

In order to implement hybrid approach, we propose to define a specific class for "interface 
degrees of freedom" which wraps classical degrees of freedom and provide information on: 

• the number of subdomains which share this degree of freedom 

• the kind of treatment (primal, dual...) which is applied 

Then degrees of freedom are set together neighborwise, defining "interface" class made 
out of a list of pointer to "interface degrees of freedom", and the global number of the 
neighbor (that number enables to identify the subdomain and to realize exchanges). 

A "subdomain" is defined as a collection of "interfaces" and a classical domain in the 
sense of usual FE software (mainly mesh), it possesses its own global identification number. 
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Figure 14. Topological interface information 



Note that with such a description of the decomposition, the local interfaces are redun- 
dant, multiple degrees of freedom appear in several interfaces. It is then necessary to take 
certain care to define some operations (transposed trace for dual degrees of freedom). A 
specific class can then be used to ease the management of multiple degrees of freedom. 

5.6.2 Defining algebraic interface objects (fig. \15\) 

In order to easily connect the domain decomposition formulation to an iterative solver, 
we propose to define "interface vectors" (displacement, intereffort...), "interface matrices" 
(trace of rigid body motions..., can be seen as a collection of interface vectors) and "in- 
terface operators" (square interface matrices), with all classical operations (basically sum, 
difference, product and transposed product). 

The particularity of these objects is to be defined on the interface and then data is 
shared between subdomains, so all the previous operations sometimes require to assemble 
data (an interesting idea is to have a boolean member indicating the assembled state of 
data). Because of the choice of description of the interface, the assembly operation requires 
certain care for primal multiple degrees of freedom (these degrees of freedom shall have the 
same value at their different occurrences). Note that an object like "interface matrix" can 
highly be optimized (mainly in terms of memory storage). 

Interface operators mainly know how to multiple vectors and matrices, they are used to 
define Schur complements, scaling operators, projectors. Composite design pattern can be 
used to simplify the succession of operations. 

In order to let the user choose the various configurations of the domain decomposition 
method, we use inheritance and object factory design pattern. 

5.6.3 Articulation between formulation and solver (fig. \16\) 

What we propose is to have client /server relationship between solver and formulation: 
basically an iterative solver needs to know how to initialize, how to multiply, how to pre- 
condition, how to do inner products, how to evaluate convergence. All these operations are 
implemented inside the "interface formulation" object which is linked to one subdomain 
(topology and stiffness) and creates "interface algebraic objects" in order to define required 
operations. 
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Figure 15. Algebraic interface objects 
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Figure 16. Articulation between formulation and solver 
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Figure 17. 16 subdomains decomposed square 



6 Assessments 

The assessments we propose here are based on the code described in the previous section. 
Basically, we have implemented the hybrid approach which lets us assess the classical primal 
and dual approaches with most classical preconditioners and projectors. 

We first present a sequence of academical tests in order to recover classical numerical 
performance results (scalability and relative efficiency of the different approaches): two 
dimensional plane stress problem, plate problem, three dimensional problem with hetero- 
geneity and unstructured decomposition. In these problems, B. denotes the characteristic 
size of the subdomains and h the characteristic size of the elements. We also present results 
on non-academic problem (bitraction test specimen). 

For all those tests, in order to compare all the approaches (including the hybrid ap- 
proach), GMRes solver is used and convergence is monitored by the norm of the residual 
as given by the solver (with e set to 10~^). In other cases (when hybrid approach is not 
assessed), the convergence is monitored through the evaluation of global primal residual 
\Ku — /I/I/I < e with e set to 10~^. Depending on the method, a different coarse prob- 
lem may be introduced, we denote by CS:a+b the size of the coarse problems (a for the 
admissibility coarse problem and b for the optimal preconditioning coarse problem) or 
num6er_o/_iteraiions^-Qtj^i_jjyjn|-,gj,_o£_j,ojjg^j.^i^^g. Note that the hybrid approach deals with two 
independent coarse problems, so their solutions is much cheaper than the solution to a 
unique large coarse problem. 

We test the primal approach with or without optimality coarse problem, the dual ap- 
proach with lumped or Dirichlet preconditioner and identity or superlumped or Dirichlet 
projector (denoted by P(I), P(W) and P(D) respectively). As for such tests no physical 
consideration can guide the choice of hybrid treatment to the interface, in order to show 
the potential of the hybrid approach, we present results where all degrees of freedom of 
one direction (C/i, or C/3) are treated in the same way. For instance "D-P" stands for a 
dual treatment for degrees of freedom associated to direction \J\ , and a primal treatment 
for degrees of freedom associated to direction C/2. 

6.1 Two dimensional plane stress problem 

We consider a simple second order two-dimensional problem, the structure is an homoge- 
neous square decomposed in square substructures meshed with linear square finite element 
(Ql Lagrange). The behavior is linear elastic (Young modulus E = 200000 MPa and Pois- 
son coefficient v = 0.3), the loading consists in clamping on the left side and punctual effort 
on the top right corner (figure [T7|) . 
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MethxKr~~~ — — 
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16 


32 


64 


Primal 


SC:0+0 
SC:0+36 


44 
11 


45 
12 


45 
14 


45 
15 


Dual 
SC:36+0 


Lumped - P(I) 
Dirichlet - P(I) 
Dirichlet - P(D) 


14 
13 
12 


25 
15 
14 


32 
17 
15 


42 
20 
17 


Hybrid D-P 


SC:12+0 - P(D) 
SC:12+12 - P(D) 


29 
12 


30 
14 


33 
16 


35 
18 


Hybrid P-D 


SC:12+0 - P(I) 
SC:12+12 - P(I) 
SC:12+0 - P(D) 
SC:12+12 - P(D) 


29 
14 
26 
12 


32 
17 
29 
14 


36 
20 
31 
16 


38 
22 
33 
18 



Table 1. Scalability results in 2D / 16 subdomains 
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Primal 


No opt. coarse 
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63o 
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(Neumann^) 


With opt. coarse 


86 


10l8 


1236 
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1490 


14l26 


15l68 




Lumped - P(I) 


186 


24i8 


2636 


2760 


2990 


29i26 


3I168 


Dual 


Dirichlet - P(I) 


96 


13l8 


1536 


1660 


1790 


18126 


19l68 




Dirichlet - P(D) 


96 


12l8 


1436 


1560 


I690 


17l26 


I8168 


Hybrid D-P 


No opt. coarse 


92 


2l6 


30i2 


4020 


5O30 


6O42 


6756 


P(D) 


With opt. coarse 


74 


12l2 


1424 


1640 


1760 


1834 


I8112 


Hybrid P-D 


No opt. coarse 


92 


206 


29i2 


3820 


4830 


5742 


5756 


P(D) 


With opt. coarse 


74 


12l2 


1424 


1640 


1760 


1884 


17ll2 



Table 2. Performance results in 2D for given ^ — 16 



Table [T] shows the number of iterations of available strategies for different number of 
elements per subdomain (for a 16-subdomain decomposition), and table [2] for different 
number of subdomains (for given ratio H/h = 16). Globally all approaches (primal, dual 
and hybrid) equipped with their best preconditioner and projector behave similarly and 
are scalable. Note that even in its optimal configuration the hybrid approach requires a 
smaller coarse space (for instance, table [H two 12 x 12 coarse problems against one 36 x 36 
coarse problem for primal or dual approaches) for equivalent efficiency. As expected if the 
optimality coarse problem is suppressed, performance results decay and scalability is lost. 
Finally for such simple problems, the simplified versions of the dual approach gives excellent 
results. 

6.2 Bending plate 

We consider a forth order plate problem, the structure is an homogeneous square decom- 
posed in squared substructures meshed with square Mindlin plate element. The behavior 
is linear elastic (Young modulus E = 200000 MPa and Poisson coefficient u = 0.3), the 
loading consists in clamping on the left side and punctual normal effort on the top right 
corner. 

Table [3] presents the number of iterations for the dual and primal approaches, with or 
without optional corner constraints (the subscript indicates the total size of coarse problems, 
ie rigid body motions and corner modes). As predicted, corner constraints are essential in 
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2O108 
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P(I) - No corner 
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Dual 


P(I) - With corners 


16l5 


2448 
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29i68 


3I255 


33360 
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P(D) - No corner 


15l2 


2536 
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43i2o 


5I18O 
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P(D) - With corners 


14l5 


2148 


2899 


3I168 


3I255 


36360 



Table 3. Bending plate: performance results for given — = 8 



order to make the algorithms scalable. Anyhow the dimension of the coarse space associated 
to corners quickly explodes which makes the methods less interesting from a CPU time point 
of view, which justifies the FETIDP philosophy which leads to much smaller coarse spaces. 



6.3 Heterogeneous 3D problem 

We consider a 3D problem, the structure is an heterogeneous cube decomposed in 3 x 3 x 3 
cubic substructures meshed with 3x3x3 Q2-Lagrange cubic elements (27 nodes per 
element). The heterogeneity pattern is described in figure [T8b. behaviors are linear elastic 
(Young modulus Ei = 200000 MPa, £"2 = 2 MPa and Poisson coefficient v = 0.3), the 
loading consists in clamping on the bottom side and constant pressure on top side. 



Method 



Number of iterations 



Primal 



19 



Dual P(D) 



No splitting 
Classical splitting 
Condensed splitting 



28 
28 

18 



Dual P(W) 



No splitting 
Classical splitting 
Condensed splitting 



21 
21 

20 



Dual P(I) 



No splitting 
Classical splitting 
Condensed splitting 



74 
74 

73 



Table 4. Heterogeneous cube 



Table H] presents the number of iterations for the conjugate gradient to converge. As- 
sessed methods are classical primal approach and dual approach with different projectors 
for all splittings (or equivalent initializations) presented in section 13.2.41 of course stiffness 
scaling is employed. What appears clearly is the good behavior of the approaches face to 
heterogeneity except the identity projector of the dual approach (which is definitely not 
suited to heterogeneous problems), and the efficiency of the condensed initialization. For 
such a problem the superlumped projector leads to very good results, anyhow for more com- 
plex cases Dirichlet projector is necessary and shall be improved at no extra computational 
cost by the condensed initialization. 



6.4 Homogeneous non-structured 3D problem 

We consider a 3D problem, the structure is an homogeneous cube meshed with Ql-Lagrange 
cubic elements (8 nodes per element). The behavior is linear elastic (Young modulus E = 
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(a) 27 subdomains structured (b) 27 subdomains unstructured decompo- 

decomposition of a heteroge- sition of a cube 

neous cube 

Figure 18. 3D assessments 



200000 MPa and Poisson coefficient v = 0.3), the loading consists in clamping on the bot- 
tom side and constant pressure on top side. We consider two kinds of decomposition: either 
structured decompositions (3x3x3 or 4x4x4 cubic substructures) or so called "unstruc- 
tured" decompositions realized by Metis software (http : //www-users . cs . umn . edu/~karypis/nietis/ ), 
see figure [T8b. 



— Decomp osition 
Method — ----___.:f:^^___^^^^ 


Structured 


Unstructured 


27 sd. 


64 sd. 


27 sd. 


64 sd. 


Primal Neumann-Neumann 


11 108 


14288 


42 


67 


Dual Dirichlet P(I) 
Dual Dirichlet P(D) 


12l08 
12l08 


I6288 
I6288 


44 
43 


69 
70 


Hybrid D-D-P P(I) 
Hybrid P-P-D P(I) 
Hybrid D-P-D P(I) 


1472 
1572 
1372 


17l92 
19i92 
17l92 







Table 5. Homogeneous cube / influence of the decomposition 

Table [5] enables to highlight the fundamental role played by the decomposition: scalabily 
result only holds for structured decomposition; moreover there might by a huge performance 
gap between two decompositions with the same number of subdomains (factor 3 for 27 
subdomains, factor 4 for 64 subdomains). 

6.5 Bitraction test specimen 

In order to assess "real life" problems, we consider the bitraction test specimen presented 
in figure [2] (this structure, courtesy of ONERA ~ Pascale Kanoute -, was optimized with 
ZeBuLoN software in order to have stress field as homogeneous as possible in its center). 
It was decomposed with Metis software into 4 or 16 subdomains. 

As shown in table [6] all methods give excellent performance results on non-academical 
problem. Note the good behavior of the hybrid approach which gives equivalent performance 
with much smaller coarse problem, even if no physical consideration could guide the choice 
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Method 


4 sd. 


16 sd. 


Primal Neumann-Neumann 


23o+i2 


30o+69 


Dual 


Lumped P(I) 
Dirichlet P(I) 
Dirichlet P(Q) 


32i2 
25i2 
24i2 


4169 
3269 
3269 


Hybrid P(Q) 


P-P-D 
D-D-P 


3l8 

258 


4446 
3746 



Table 6. Bitraction test specimen 



of the treatment of interface degrees of freedom. 
7 Conclusions 

In this paper, we have reviewed most used non-overlapping domain decomposition methods. 
These methods are perfectly suited to modern computational hardware, they are based on 
very close concepts which we tried to outline. We introduced the hybrid framework to 
include as many methods as possible: the principle is to assign to each interface degree of 
freedom its own treatment, for now primal and dual treatments have been implemented, 
and mixed and recondensed should follow. Hybriding methods also enable to define physic- 
friendly approaches for multifield problems. 

Because of the conceptual proximity of all methods, assessments showed very close nu- 
merical performance results. Once equipped with convenient preconditioner and coarse 
problem, all the methods proved their ability to handle second and forth order elasticity in 
presence of strong heterogeneities. Though from a computational point of view some com- 
bination may be more interesting: dual approach with lumped preconditioner or simplified 
projector (if these are sufficient to ensure fine rate of convergence), hybrid approach (which 
generates smaller coarse space). We also outlined the importance of the decomposition 
even for very simple problems. The methods have also proved their efficiency on industrial 
cases, some of them were implemented in computational softwares. 

In this paper we limited to the solution to linearized systems, which anyhow enables 
to solve nonlinear problems. Another strategy is to commute the nonlinear solver and the 
domain decomposition method so that nonlinear problems can be solved independently on 
each subdomain. Another evolution of domain decomposition philosophy is the decompo- 
sition of the time interval [72] for evolution problems. 
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A Krylov iterative solvers 

Krylov iterative solvers for tiie resolution of linear systems have been widely studied. The 
aim of this section is just to briefly present important results and algorithms, reader inter- 
ested in wider documentation can refer to [UB]) and to [1| for shorter explanation. 

Krylov methods belong to the projection class of iterative algorithms, which consist in 
approximating solution S^^b of system Sx = b by vector p{S)b where p is a smartly built 
polynomial. 

In this section we consider the iterative solution to system Sx = b. S is a n x n matrix 
and b a vector in range(S'). The i^^ iteration leads to approximation Xi of the solution, 
associated residual is ri = b — Sxi = S{x — Xj). Initialization is xq (most often xq = 0). 
Canonical (orthonormal) basis of M" reads (ei, . . . ,e„). 

A.l Principle of Krylov solvers 

Krylov solvers are based on the iterative construction of so-called "Krylov subspace" fCm{S, ro) 
defined by: 



The solution to linear system consists in searching Xm under the following constraints: 



where the choice of the orthogonality relationship enables to define various approaches. 
A. 2 Most used solvers 

We herein present two of the principal Krylov solvers. First GMRes (97] which is suited to 
any type of matrix, then conjugate gradient which is adapted to symmetric definite positive 
matrices. 

Of course, the iterative solution to a linear system assumes that a convergence criterion 
is employed, and that a limit of precision is set so that the system is supposed to have 
converged once the criterion is below this precision. We note e this limit value of the 
criterion. 

A.3 GMRes 

Algorithm GMRes (alg. lA.ip consists in an oblique projection based on the construction 
of Krylov subspace ICm{S,vo) with vq = 7'o/||ro||2- The research principle is: 



which is equivalent to finding Xm £ xq -\- ICm{S,rQ) minimizing ||rm||2- 

A particulary striking property of GMRes is not to compute the approximation at each 
iteration, a smart implementation of GMRes enables to directly access the norm of the 
residual ||rj||2. Only the final approximation Xm is computed (by the inversion of a m x m 
upper triangular matrix). From the computation complexity point of view, each iteration 
consists in a full orthonormalization of vector Wj with respect to ICj. 

Algorithm GMRes(m) (or restarted GMRes) consists in stopping computation before 
convergence at a priori fixed step m and restarting computation using previous Xm as 
initialization. This strategy aims at minimizing orthogonalization computations by limiting 
the size of Krylov subspaces [29]. This method may lead to stagnation for non positive 
definite matrices. 








(3) 
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Algorithm A.l GMRes 

1: Compute ro = 6 - Sxq, vo = ?"o/||ro||2 
2: for j = 0, . . . , m — 1 do 
3: Compute Wj = Svj 
4: for i = 0, . . . , j do 

5: hij = {vi,Wj) 

6: = Wj — hijVi 

7: end for 

8: hi^j^^-^j = \\wj\\2 

9: if \\rj II2 ^ e then 

10: stop 

11: else 

12: Vj+i = Wj/\j+]^)j 

13: end if 
14: end for 

15: Compute Um minimizing ||||ro||2ei — -ffm?/||2 set Xm = xq + VmU-i 



A. 4 Conjugate gradient 

Let 5 be a symmetric positive definite matrix, a conjugate gradient algorithm consists in 
an orthogonal projection. The research principle is: 

Xm e Xq +)Cm{S,rQ) 

-L ICm{S,ro) 

which is equivalent to finding Xm £ xq + JCmiS,rQ) minimizing \\xm — x\\s- 

Because of the properties of S, conjugation (orthogonality) properties appear, leading 
to algorithm IA.2[ The algorithm is based on the construction of various basis of lCm{S, tq) : 



Algorithm A. 2 Conjugate gradient 
1: Compute ro = 6 — Sxq, set wq = vq 
2: for j = 0, . . . ,m do 

3: aj = {rj,rj)/{Swj,Wj) 

4: Xj+i = Xj + ajWj 

5: rj+i = rj - ajSwj 
6: f^j = irj+i,rj+i)/{rj,rj) 
7: Wj+i = rj+i + /3jWj 
8: end for 



(rm) (residual basis) is orthogonal, (wm) (research direction basis) is S'-orthogonal. Step 6 — 
7 of algorithm IA.2I is the S-orthogonalization of Wj+i with respect to Wj which theoretically 
implies the orthogonality of Wj-^-i with respect to all previous research directions. However 
this orthogonality property is numerically lost as the number of iterations increases, it is 
then more suited to use a full orthogonalization of research directions leading to algorithm 

EH 

Full reorthogonalization is often compulsory for complex simulations. Various imple- 
mentations are available (among others Gram-Schmidt, modified Gram-Schmidt, iterative 
Gram-Schmidt [7HI51j) depending on the chosen ratio between precision and computational 
cost. Our experience leads us to prefer modified Gram-Schmidt algorithm (the one used in 
algorithm lA.ip to classical Gram-Schmidt (the one used in algorithm lA.Sp . Note that once 
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Algorithm A. 3 Reorthogonalized conjugate gradient 

1: Compute ro = b — Sxq, set wq = vq 

2: for j = 0, . . . , m do 

3: aj = {rj,rj)/{Swj,Wj) 

4: Xj+i = Xj + ajWj 

5: rj+i = rj - ajSwj 

6: For ^ -i ^ j, /3j = -{rj+i,Swi)/{wi,Swi) 

7: Wj+i = rj+i + YlUi l3'jWi 

8: end for 



fully reorthogonalized, conjugate gradient is almost as expensive as GMRes. Anyhow con- 
jugate gradient provides the approximation at each iteration, which can be very useful (see 
for instance section 13.2.31 where such an information enables the computation of relevant 
convergence criterion) . 

A. 5 Study of the convergence, preconditioning 

Because of their error-minimization property, conjugate gradient and GMRes have conver- 
gence theorems with known minimal convergence rate, for instance for conjugate gradient: 

m 

\\x-xo\\s (5) 

where k is the condition number of matrix S. Condition number is the ratio between the 
biggest and the smallest eigenvalues. 

with I All ^ IA2I ^ ... ^ |A„| eigenvalues of S (6) 

Moreover performance results of Krylov iterative solvers are strongly linked to the spec- 
trum of matrix S. More precisely only the active spectrum (set of eigenvalues which the 
right hand side is not orthogonal to the associated eigenvectors) influences the convergence; 
condition number k can be replaced with active condition number Kact inside relation ([5]) 
leading to better convergence range. More precise study would lead to the introduction of 
Ritz spectrum and efl^ective condition number |107j . 

These simple considerations are sufficient to explain the interest of preconditioning the 
system: the idea is to solve equivalent system S~^Sx = S~^b where is a well-chosen 
matrix providing the system with better spectral properties (if ~ then condition 
number is optimal, which justifles the notation). 

For conjugate gradient, the use of preconditioner may seem problematic since the sym- 
metry is a priori lost. However if preconditioner is symmetric definite positive, ap- 
plying conjugate gradient to nonsymmetric system is equivalent to a symmetric resolution 
{{L~'^ S L~^){Lx) = L~^b with Cholesky factorization S = L) and the method is still 
relevant. 

So preconditioning the above two algorithms is simply realized replacing S by S^'^S and 
b by S^'^b in lines 1 and 3 of algorithm IA.1[ and in lines 1 and 5 of algorithm IA.3I (anvhow 
the research directions are still S-orthogonal) . Of course the main problem remains the 
definition of an efficient preconditioner. 
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A. 6 Constrained Krylov methods, projector implementation 

We may deal (for instance in the dual approach) with constrained systems such as: 

A o) (:) - (^) 

Because constraint G'^xq = e is compulsory, it is often referred to as "admissibility con- 
straint". A classical solution is to find an initialization xq which satisfies constraint and 
then ensure that the remainder of the solution is researched inside a supplemental space: 
G'^{xi — Xq) = 0. a projected algorithm naturally arises: 

X = xo + Px* (8) 
G'^xq = e (9) 
G'^P = (10) 



which leads to: 



QG{G'^QG) \ (11) 
P = I -QG{G'^QGy^G^ (12) 

where Q is a matrix so that matrix G^QG is invertible. Iterative system then reads: 

P^SPx* = P'^{b-Sxo) (13) 
Q can be post-computed a = {G^QG) G^Q [h - Sx). 
A. 7 Augmented-Krylov methods, projector implementation 

Augmented-Krylov methods |14l |95] are employed to add optional constraints to the res- 
olution of a system. The principle is to set subspace C in M" of dimension ric repre- 
sented by n X TT-c rectangular matrix G (range(C) = C, for more simplicity we suppose 
that G is full-ranked-column) , then to define augmented-Krylov subspace ICm{S,ro,G) = 
ICm{S,rQ) + range(C), and to use the following research principle: 

Xm G xq^+ ICm{S, ro, G) ^^^^ 
-L? ICmiS,ro,C) 

Augmented-Krylov methods can be implemented either by reorthogonalization schemes 
or by projection methods which are the one we propose to present here. The research 
space is separated into two subspaces: range(C) and a supplemental subspace. The part of 
the solution in range(C) is detected during initialization, while the remainder is iteratively 
looked for, projector P ensures the research is realized inside the correct subspace. 

X = xq + Px* (15) 
G'^ro = G^{b - Sxo) = (16) 
G^SP = (17) 



Which leads to: 



xo = G{G^SG) ^G^b (18) 
p = I - G {G^ SG)'^ G^ S (19) 
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system then reads: 

SPx* = b-Sxo (20) 
or P^SPx* = (6 - Sxo) = P^h (21) 

Though it can be proved that projected system is better conditioned than original prob- 
lem [25j, the efficiency of the method essentially depends on the choice of matrix C, which is 
most often an opened problem. Within the framework of domain decomposition methods, 
this choice can be guided by several considerations. In the framework of multiresolution, 
reuse of previous numerical information can lead to very interesting performance results 

[HsllMllSSlElllsSllHZj 

A. 8 Constrained augmented Krylov methods 

We here consider solving constrained system ([7]) with C-augmented algorithm. The admis- 
sibility constraint is often referred to as first level constraint and augmentation as second 
level constraint. Two strategies are possible, the first consists in mixing levels together 
while the second respects the hierarchy between constraints. 

One projector strategy: set J = {G S^H) and e = (^jjTj^ j then system reads: 

^ o) il) - (^) (2^) 

the following initialization/projection are employed (Q is a parameter to tune) 

xo = J{J^Qjy^e (23) 
P = I -Qj{j^Qjy^ (24) 

Because Q is not easy to interpret and choose, this method is hardly ever used. 

Two-projector strategy: the two conditions are imbricated. First ensure admissibility 
constraint 

X = xo + Px* (25) 

XQ = G{G^QG)~\ (26) 

P = I -QG{G^QG)~^G^ (27) 



then set 



X* = x*Q + P*x** (28) 
= C {C^P^SPG)'^ G^P^ {b- Sxo) (29) 



X, 



P* = I - PG {G^P^SPC) ^C^P^S (30) 

so that optimality constraint is verified. As can be seen such an approach is equivalent 
to classical augmentation with making second level constraints consistent with the first 
(setting C* = PG). 



